OpenShift hardening using the Compliance Operator // Daniel Ops

Table of Contents

Introduction

When we talk about Cyber Security, there are a lot of aspects to be focused on to keep our services secure, and from the point of view of the platform, during all Internet years, the industry has created some standards in order to keep minimum requirements so that our infrastructure is secure enough to prevent unauthorized access or DoS.

For Kubernetes/OpenShift, there are existing specifications on how the clusters needs to be configured to minimize these security risks. Some of these standards are:

CIS Benchmarks
ACSC
NIST SP-800-53
NERC CIP
PCI

All these standards are trying to ensure that the configuration of the platform is secure to run workloads on production environments. Hence, to guarantee that our Kubernetes/OpenShift cluster is secure, we can run one or more of these benchmarks, and apply the remediations recommended. You can choose the profiles that are appropriate for your cluster depending on your use case.

In the case of Kubernetes/OpenShift clusters we must provide two kinds of benchmarks, one for the operating system and another one for the control plane.

In this post, we are going to use a Single Node OpenShift running with version v4.11.22 where we are going to install the Compliance Operator, and its required dependencies. As part of this article, we will create a basic configuration to run a compliance scan, understand the results and the remediations. We won’t cover each part of the Operator or review all the features as this is meant to be an introduction to the value of this operator and how to quickly run a first scan to perform hardening.

Compliance Operator

This operator tries to make it easy to scan our cluster and check the status of the compliance based on some standards profiles, like the ones described above. It is based on the open-source tool OpenSCAP. For more information about the Compliance Operator you can visit the official documentation.

Requirements

Before installing the Compliance Operator, we need an OpenShift cluster running with version 4.11+.

A default StorageClass is also required, to allow the creation of PVCs to persist the results of the scans. In the case of our Single Node OpenShift deployment, we are using the LVMO operator, but in a multi nodes cluster, ODF can be used instead. Installation of those operators is outside the scope of this blog post, and left as an exercise to the reader.

Installation

The Compliance Operator can be installed using OLM and is available on the OperatorHub, so the procedure is the same as installing any other operator on OpenShift

We create a Namespace, an OperatorGroup and a Subscription object. Below are the YAML files and commands used to create those objects on our cluster:

namespace.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    openshift.io/cluster-monitoring: "true"
  name: openshift-compliance

Command to create the Namespace object.

oc apply -f namespace.yaml

operator-group.yaml

---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: compliance-operator
  namespace: openshift-compliance
spec:
  targetNamespaces:
  - openshift-compliance

Command to create the OperatorGroup object.

oc apply -f operator-group.yaml

subscription.yaml

---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: compliance-operator-sub
  namespace: openshift-compliance
spec:
  channel: "release-0.1"
  installPlanApproval: Automatic
  name: compliance-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace

Command to create the Subscription object.

oc apply -f subscription.yaml

Once those objects are applied to our cluster, we can check if the operator is installed, for that matter, a couple of things can be checked. The first one is the ClusterServiceVersion.

$ oc -n openshift-compliance get csv
NAME                          DISPLAY               VERSION   REPLACES   PHASE
compliance-operator.v0.1.59   Compliance Operator   0.1.59               Succeeded

The second one is listing all the pods running in the openshift-compliance Namespace, the output should be something similar as below.

$ oc -n openshift-compliance get pods
NAME                                              READY   STATUS      RESTARTS       AGE
compliance-operator-6c9f9bcc78-jb6nm              1/1     Running     18 (21h ago)   5d19h
ocp4-openshift-compliance-pp-bf7f56444-mcqd2      1/1     Running     7              7d19h
rhcos4-openshift-compliance-pp-7bf7d6bd96-nw2lf   1/1     Running     6              7d19h

If the above checks fail, follow this troubleshooting guide for OLM.

Configure and run scans

Awesome! At this point, we have our OCP cluster running with the Compliance Operator running. Now it is time to see which compliance profiles are available, and how to configure their execution.

First, we check which compliance profiles are available, getting the list of them with the profiles.compliance.openshift.io CRD. Run the below command, and you should get an output similar to the following one.

$ oc get profiles.compliance.openshift.io 
NAME                 AGE
ocp4-cis             7d19h
ocp4-cis-node        7d19h
ocp4-e8              7d19h
ocp4-high            7d19h
ocp4-high-node       7d19h
ocp4-moderate        7d19h
ocp4-moderate-node   7d19h
ocp4-nerc-cip        7d19h
ocp4-nerc-cip-node   7d19h
ocp4-pci-dss         7d19h
ocp4-pci-dss-node    7d19h
rhcos4-e8            7d19h
rhcos4-high          7d19h
rhcos4-moderate      7d19h
rhcos4-nerc-cip      7d19h

As you can see, there are profiles available based on the standards listed in the introduction. Here, we will run profiles ocp4-cis, ocp4-cis-node and ocp4-moderate for the control plane scans, and rhcos4-moderate for the OS scans.

How this operator is configured is similar to how RBAC is configured, in RBAC we define Users and Roles, and after, we create a RoleBinding, in the case of the Compliance Operator we are going to define ScanSettings and we have the listed above profiles, and later on we are going to create a ScanSettingBinding where we configure the run of the scan. Let’s see how to configure our scan to run the desired profiles.

Create ScanSettings

The ScanSettings describes which kind of node is going to run the pods for the scan, the persistence where the results are going to be saved, the schedule when to scan will be run, and also the two last lines that are commented in this example define whether we want that the results with FAIL status get automatically remediated. Once the scan is run, the failed results have a remediation object with a MachineConfig per result to remediate those with auto remediation set.

scansettings.yaml

apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSetting
metadata:
  generation: 1
  name: first-scan
  namespace: openshift-compliance
rawResultStorage:
  nodeSelector:
    node-role.kubernetes.io/master: ""
  pvAccessModes:
  - ReadWriteOnce
  rotation: 3
  size: 1Gi
  tolerations:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
    operator: Exists
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  - effect: NoSchedule
    key: node.kubernetes.io/memory-pressure
    operator: Exists
roles:
- master
scanTolerations:
- operator: Exists
schedule: 41 12 * * *
showNotApplicable: false
strictNodeScan: true
# autoApplyRemediations: true
# autoUpdateRemediations: true

Create the ScanSettings object running the below command.

oc apply -f scansettings.yaml

Verify that our ScanSettings have been created properly. The output should be similar to the below capture, be aware that there are two faulty ScanSettings, one with auto remediation enabled and another without it.

$ oc -n openshift-compliance get scansettings
NAME                 AGE
default              8d
default-auto-apply   8d
first-scan           7d22h

Create ScanSettingBinding

As mentioned before, the configuration of this operator is pretty similar to how one configures RBAC, you create some objects and later on, you create a binding of them. Now we already have the ScanSettings and the pre-installed profiles, so the next step is to create a ScanSettingBinding to describe which profiles will be used for the scan, matching with given ScanSettings. Be aware that we cannot create a ScanSettingBinding for profiles that are not of the same kind, so we have to create one ScanSettingBinding for the scan of the control plane and another for the OS of the hosts.

scansettingbinding-ocp4.yaml

apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: test-scan-ocp4
  namespace: openshift-compliance
profiles:
  - name: ocp4-cis
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
  - name: ocp4-cis-node
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
  - name: ocp4-moderate
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
settingsRef:
  name: first-scan
  kind: ScanSetting
  apiGroup: compliance.openshift.io/v1alpha1

scansettingbinding-rhcos4.yaml

apiVersion: compliance.openshift.io/v1alpha1
kind: ScanSettingBinding
metadata:
  name: test-scan-rhcos
  namespace: openshift-compliance
profiles:
  - name: rhcos4-moderate
    kind: Profile
    apiGroup: compliance.openshift.io/v1alpha1
settingsRef:
  name: first-scan
  kind: ScanSetting
  apiGroup: compliance.openshift.io/v1alpha1

We apply those to ScanSettingBinding objects.

oc apply -f scansettingbinding-ocp4.yaml -f scansettingbinding-rhcos4.yaml

Once the ScanSettingBinding is applied, the scan will start at the scheduled date and time. To validate that the scan is running, execute the below command, be aware that the capture is done when the scans are finished, if the scans are in progress, the status is RUNNING instead of DONE.

$ oc -n openshift-compliance get compliancescans.compliance.openshift.io 
NAME                     PHASE   RESULT
ocp4-cis                 DONE    NON-COMPLIANT
ocp4-cis-node-master     DONE    NON-COMPLIANT
ocp4-moderate            DONE    NON-COMPLIANT
rhcos4-moderate-master   DONE    NON-COMPLIANT

The run of the scan are jobs running in the openshift-compliance Namespace, you can get the pods list for troubleshooting the scan.

$ oc -n openshift-compliance get pods
NAME                                              READY   STATUS      RESTARTS         AGE
compliance-operator-6c9f9bcc78-jb6nm              1/1     Running     21 (3h33m ago)   6d
test-scan-ocp4-rerunner-27908349-sql4x            0/1     Completed   0                45h
test-scan-ocp4-rerunner-27909401-2bqvr            0/1     Completed   0                27h
test-scan-ocp4-rerunner-27910841-84jll            0/1     Completed   0                3h40m
test-scan-rhcos-rerunner-27908349-bqmzd           0/1     Completed   0                45h
test-scan-rhcos-rerunner-27909401-bzvpw           0/1     Completed   0                27h
test-scan-rhcos-rerunner-27910841-m44gd           0/1     Completed   0                3h40m
ocp4-openshift-compliance-pp-bf7f56444-mcqd2      1/1     Running     8                8d
rhcos4-openshift-compliance-pp-7bf7d6bd96-nw2lf   1/1     Running     7                8d

Get results

OK! So far so good, we already have run the scans, but the goal of this is to understand the security compliance of our cluster, and remediate the configuration if necessary. Let’s get the results of the scans querying the CRD compliancecheckresults.compliance.openshift.io, this will provide a list of all the reports generated by the scan.

$ oc -n openshift-compliance get compliancecheckresults.compliance.openshift.io
NAME                                                                                                STATUS   SEVERITY
ocp4-cis-accounts-restrict-service-account-tokens                                                   MANUAL   medium
ocp4-cis-accounts-unique-service-account                                                            MANUAL   medium
ocp4-cis-api-server-admission-control-plugin-alwaysadmit                                            PASS     medium
ocp4-cis-api-server-admission-control-plugin-alwayspullimages                                       PASS     high
ocp4-cis-api-server-admission-control-plugin-namespacelifecycle                                     PASS     medium
ocp4-cis-api-server-admission-control-plugin-noderestriction                                        PASS     medium
ocp4-cis-api-server-admission-control-plugin-scc                                                    PASS     medium
ocp4-cis-api-server-admission-control-plugin-securitycontextdeny                                    PASS     medium
ocp4-cis-api-server-admission-control-plugin-service-account                                        PASS     medium
ocp4-cis-api-server-anonymous-auth                                                                  PASS     medium
ocp4-cis-api-server-api-priority-flowschema-catch-all                                               PASS     medium
ocp4-cis-api-server-audit-log-maxbackup                                                             PASS     low
ocp4-cis-api-server-audit-log-maxsize                                                               PASS     medium
ocp4-cis-api-server-audit-log-path                                                                  PASS     high
ocp4-cis-api-server-profiling-protected-by-rbac                                                     PASS     medium
ocp4-cis-api-server-request-timeout                                                                 PASS     medium
ocp4-cis-api-server-service-account-lookup                                                          PASS     medium
ocp4-cis-api-server-service-account-public-key                                                      PASS     medium
ocp4-cis-api-server-tls-cert                                                                        PASS     medium
ocp4-cis-api-server-tls-cipher-suites                                                               PASS     medium
ocp4-cis-api-server-tls-private-key                                                                 PASS     medium
ocp4-cis-api-server-token-auth                                                                      PASS     high
ocp4-cis-audit-log-forwarding-enabled                                                               FAIL     medium
ocp4-cis-audit-profile-set                                                                          FAIL     medium
ocp4-cis-configure-network-policies                                                                 PASS     high
ocp4-cis-configure-network-policies-namespaces                                                      FAIL     high

Some of the output have been removed for clarity, as there are more than 500 results, but in the capture we can see different kinds of reports, with different statuses and severities.

As we saw in the previous section, the output of the compliancescans was that the result of each scan was NON-COMPLIANT, which means that we have at least one check that in FAIL status. If you take a look to the results capture, you can find multiples results with status FAIL.

Now, what is important for us, are the FAIL results, to get only these results we use the below command:

$ oc -n openshift-compliance get compliancecheckresults -l 'compliance.openshift.io/check-status in (FAIL),compliance.openshift.io/automated-remediation'
NAME                                                     STATUS   SEVERITY
ocp4-cis-audit-profile-set                               FAIL     medium
rhcos4-moderate-master-configure-usbguard-auditbackend   FAIL     medium
rhcos4-moderate-master-service-usbguard-enabled          FAIL     medium
rhcos4-moderate-master-usbguard-allow-hid-and-hub        FAIL     medium

That’s fine, we can see what is wrong in our configuration, but how do we understand what each scan means. On each standard, all the checks have an explanation about why this misconfiguration is a security risk, and also provide a remediation. In order to see this information, we can get it from the content of each result object, the example below shows how to see those details for one of the failed results.

$ oc -n openshift-compliance get compliancecheckresults.compliance.openshift.io ocp4-cis-audit-profile-set -oyaml | yq .description
Ensure that the cluster's audit profile is properly set
Logging is an important detective control for all systems, to detect potential
unauthorised access.

Also, we can get the remediation information from this object.

$ oc -n openshift-compliance get compliancecheckresults.compliance.openshift.io ocp4-cis-audit-profile-set -oyaml | yq .instructions
Run the following command to retrieve the current audit profile:
$ oc get apiservers cluster -ojsonpath='{.spec.audit.profile}'
Make sure that the returned profile matches the one that should be used.

In the next section, we are going to go more in detail about remediations, but it is important to see how to get this information from the results, because the failed results are recommendations, and these recommendations may be incompatible with our environment because of some other requirements.

Just in case you need to share this report with someone else like I had to, I wrote the next script to export the failed results to a .csv file.

#!/bin/sh

CHECKS=$(oc get compliancecheckresults.compliance.openshift.io  | grep FAIL | cut -f1  -d' ')

echo "NAME; DESCRIPTION; SEVERITY" > results.csv
for i in $CHECKS 
do
	DESCRIPTION=$(oc get compliancecheckresults.compliance.openshift.io $i -o jsonpath='{.description}')
	SEVERITY=$(oc get compliancecheckresults.compliance.openshift.io $i -o jsonpath='{.severity}')
	echo "$i; \"$DESCRIPTION\"; $SEVERITY" >> results.csv
done

Remediations

Eventually, we are installing this operator and running all these scans to try to configure our cluster more securely, and so far we just know that there are some changes to do in our configuration to fit the desired standards. This is where the power of this operator lies, since for each failed result, the operator also creates a CRD called complianceremediations.compliance.openshift.io which allows the operator to apply the remediation in our cluster. Those auto remediations are not applicable for results with status MANUAL, so those will require manual intervention to be solved.

The next command lists all the complianceremediations and the status. In the capture, all the auto remediation have been applied already, which is why the status of all is Applied, but if you have run the scan with the ScanSettings without enabling the autoApplyRemediations option, you will see a different output.

$ oc get complianceremediations.compliance.openshift.io | more
NAME                                                                                                STATE
ocp4-cis-api-server-encryption-provider-cipher                                                      Applied
ocp4-cis-api-server-encryption-provider-config                                                      Applied
ocp4-cis-audit-profile-set                                                                          Applied
ocp4-cis-kubelet-enable-streaming-connections                                                       Applied
ocp4-cis-kubelet-enable-streaming-connections-1                                                     Applied
ocp4-cis-kubelet-eviction-thresholds-set-hard-imagefs-available                                     Applied
ocp4-cis-kubelet-eviction-thresholds-set-hard-imagefs-available-1                                   Applied
ocp4-cis-kubelet-eviction-thresholds-set-hard-imagefs-available-2                                   Applied
ocp4-cis-kubelet-eviction-thresholds-set-hard-imagefs-available-3                                   Applied
ocp4-cis-kubelet-eviction-thresholds-set-hard-imagefs-inodesfree                                    Applied
ocp4-cis-kubelet-eviction-thresholds-set-hard-imagefs-inodesfree-1                                  Applied

Let’s take a look at the content of one of the complianceremediations CRs:

$ oc get complianceremediations.compliance.openshift.io rhcos4-moderate-master-audit-rules-dac-modification-chmod -oyaml 
apiVersion: compliance.openshift.io/v1alpha1
kind: ComplianceRemediation
metadata:
  creationTimestamp: "2023-01-18T11:05:06Z"
  generation: 2
  labels:
    compliance.openshift.io/scan-name: rhcos4-moderate-master
    compliance.openshift.io/suite: esmb-rhcos
  name: rhcos4-moderate-master-audit-rules-dac-modification-chmod
  namespace: openshift-compliance
  ownerReferences:
  - apiVersion: compliance.openshift.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: ComplianceCheckResult
    name: rhcos4-moderate-master-audit-rules-dac-modification-chmod
    uid: 83639492-6c8e-4685-8ec2-4f07a67af700
  resourceVersion: "5574521"
  uid: f8d08dae-6132-460a-85ad-f8ffc8d79042
spec:
  apply: true
  current:
    object:
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      spec:
        config:
          ignition:
            version: 3.1.0
          storage:
            files:
            - contents:
                source: data:,-a%20always%2Cexit%20-F%20arch%3Db32%20-S%20chmod%20-F%20auid%3E%3D1000%20-F%20auid%21%3Dunset%20-F%20key%3Dperm_mod%0A-a%20always%2Cexit%20-F%20arch%3Db64%20-S%20chmod%20-F%20auid%3E%3D1000%20-F%20auid%21%3Dunset%20-F%20key%3Dperm_mod%0A
              mode: 420
              overwrite: true
              path: /etc/audit/rules.d/75-chmod_dac_modification.rules
  outdated: {}
  type: Configuration
status:
  applicationState: Applied

As you can see in the capture, eventually the ComplianceRemediation object will apply a MachineConfig to configure our cluster with the recommendations. Hence, for each remediation we can get the MachineConfig object that solves the problem. If desired, we can get these objects to be applied to another cluster that is not running the Compliance Operator. The command below shows how to get this MachineConfig object from the ComplianceRemediation object.

$ oc get complianceremediations.compliance.openshift.io rhcos4-moderate-master-audit-rules-dac-modification-chmod -oyaml | yq .spec.current.object
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      files:
        - contents:
            source: data:,-a%20always%2Cexit%20-F%20arch%3Db32%20-S%20chmod%20-F%20auid%3E%3D1000%20-F%20auid%21%3Dunset%20-F%20key%3Dperm_mod%0A-a%20always%2Cexit%20-F%20arch%3Db64%20-S%20chmod%20-F%20auid%3E%3D1000%20-F%20auid%21%3Dunset%20-F%20key%3Dperm_mod%0A
          mode: 420
          overwrite: true
          path: /etc/audit/rules.d/75-chmod_dac_modification.rules

Conclusions

Security is something very important in production environments. From the platform perspective, the described standards and tools help to keep a better configuration, and a more robust environment where we can run our workloads.

The Compliance Operator helps us getting those misconfigurations and, understand and apply remediations to keep a more secure environment, all of this only through Kubernetes native objects. Additionally, the remediation can be listed and exported as MachineConfig objects to be applied to a different cluster without the Compliance Operator running.

Table of Contents