Chapter 6. File Integrity Operator
6.1. File Integrity Operator overview
The File Integrity Operator continually runs file integrity checks on the cluster nodes. It deploys a DaemonSet that initializes and runs privileged Advanced Intrusion Detection Environment (AIDE) containers on each node, providing a log of files that have been modified since the initial run of the DaemonSet pods.
For the latest updates, see the File Integrity Operator release notes.
Installing the File Integrity Operator
Updating the File Integrity Operator
Understanding the File Integrity Operator
Configuring the Custom File Integrity Operator
6.2. File Integrity Operator release notes
The File Integrity Operator for OpenShift Container Platform continually runs file integrity checks on RHCOS nodes.
These release notes track the development of the File Integrity Operator in the OpenShift Container Platform.
For an overview of the File Integrity Operator, see Understanding the File Integrity Operator.
To access the latest release, see Updating the File Integrity Operator.
6.2.1. OpenShift File Integrity Operator 1.3.4
The following advisory is available for the OpenShift File Integrity Operator 1.3.4:
6.2.1.1. Bug fixes
Previously, File Integrity Operator would issue a NodeHasIntegrityFailure
alert due to multus certificate rotation. With this release, the alert and failing status are now correctly triggered. (OCPBUGS-31257)
6.2.2. OpenShift File Integrity Operator 1.3.3
The following advisory is available for the OpenShift File Integrity Operator 1.3.3:
This update addresses a CVE in an underlying dependency.
6.2.2.1. New features and enhancements
- You can install and use the File Integrity Operator in an OpenShift Container Platform cluster running in FIPS mode.
To enable FIPS mode for your cluster, you must run the installation program from a RHEL computer configured to operate in FIPS mode. For more information about configuring FIPS mode on RHEL, see (Installing the system in FIPS mode)
6.2.2.2. Bug fixes
-
Previously, some FIO pods with private default mount propagation in combination with
hostPath: path: /
volume mounts would break the CSI driver relying on multipath. This problem has been fixed and the CSI driver works correctly. (Some OpenShift Operator pods blocking unmounting of CSI volumes when multipath is in use) - This update resolves CVE-2023-39325. (CVE-2023-39325)
6.2.3. OpenShift File Integrity Operator 1.3.2
The following advisory is available for the OpenShift File Integrity Operator 1.3.2:
This update addresses a CVE in an underlying dependency.
6.2.4. OpenShift File Integrity Operator 1.3.1
The following advisory is available for the OpenShift File Integrity Operator 1.3.1:
6.2.4.1. New features and enhancements
- FIO now includes kubelet certificates as default files, excluding them from issuing warnings when they’re managed by OpenShift Container Platform. (OCPBUGS-14348)
- FIO now correctly directs email to the address for Red Hat Technical Support. (OCPBUGS-5023)
6.2.4.2. Bug fixes
-
Previously, FIO would not clean up
FileIntegrityNodeStatus
CRDs when nodes are removed from the cluster. FIO has been updated to correctly clean up node status CRDs on node removal. (OCPBUGS-4321) - Previously, FIO would also erroneously indicate that new nodes failed integrity checks. FIO has been updated to correctly show node status CRDs when adding new nodes to the cluster. This provides correct node status notifications. (OCPBUGS-8502)
-
Previously, when FIO was reconciling
FileIntegrity
CRDs, it would pause scanning until the reconciliation was done. This caused an overly aggressive re-initiatization process on nodes not impacted by the reconciliation. This problem also resulted in unnecessary daemonsets for machine config pools which are unrelated to theFileIntegrity
being changed. FIO correctly handles these cases and only pauses AIDE scanning for nodes that are affected by file integrity changes. (CMP-1097)
6.2.4.3. Known Issues
In FIO 1.3.1, increasing nodes in IBM Z® clusters might result in Failed
File Integrity node status. For more information, see Adding nodes in IBM Power® clusters can result in failed File Integrity node status.
6.2.5. OpenShift File Integrity Operator 1.2.1
The following advisory is available for the OpenShift File Integrity Operator 1.2.1:
- RHBA-2023:1684 OpenShift File Integrity Operator Bug Fix Update
- This release includes updated container dependencies.
6.2.6. OpenShift File Integrity Operator 1.2.0
The following advisory is available for the OpenShift File Integrity Operator 1.2.0:
6.2.6.1. New features and enhancements
-
The File Integrity Operator Custom Resource (CR) now contains an
initialDelay
feature that specifies the number of seconds to wait before starting the first AIDE integrity check. For more information, see Creating the FileIntegrity custom resource. -
The File Integrity Operator is now stable and the release channel is upgraded to
stable
. Future releases will follow Semantic Versioning. To access the latest release, see Updating the File Integrity Operator.
6.2.7. OpenShift File Integrity Operator 1.0.0
The following advisory is available for the OpenShift File Integrity Operator 1.0.0:
6.2.8. OpenShift File Integrity Operator 0.1.32
The following advisory is available for the OpenShift File Integrity Operator 0.1.32:
6.2.8.1. Bug fixes
- Previously, alerts issued by the File Integrity Operator did not set a namespace, making it difficult to understand from which namespace the alert originated. Now, the Operator sets the appropriate namespace, providing more information about the alert. (BZ#2112394)
- Previously, The File Integrity Operator did not update the metrics service on Operator startup, causing the metrics targets to be unreachable. With this release, the File Integrity Operator now ensures the metrics service is updated on Operator startup. (BZ#2115821)
6.2.9. OpenShift File Integrity Operator 0.1.30
The following advisory is available for the OpenShift File Integrity Operator 0.1.30:
6.2.9.1. New features and enhancements
The File Integrity Operator is now supported on the following architectures:
- IBM Power®
- IBM Z® and IBM® LinuxONE
6.2.9.2. Bug fixes
- Previously, alerts issued by the File Integrity Operator did not set a namespace, making it difficult to understand where the alert originated. Now, the Operator sets the appropriate namespace, increasing understanding of the alert. (BZ#2101393)
6.2.10. OpenShift File Integrity Operator 0.1.24
The following advisory is available for the OpenShift File Integrity Operator 0.1.24:
6.2.10.1. New features and enhancements
-
You can now configure the maximum number of backups stored in the
FileIntegrity
Custom Resource (CR) with theconfig.maxBackups
attribute. This attribute specifies the number of AIDE database and log backups left over from there-init
process to keep on the node. Older backups beyond the configured number are automatically pruned. The default is set to five backups.
6.2.10.2. Bug fixes
-
Previously, upgrading the Operator from versions older than 0.1.21 to 0.1.22 could cause the
re-init
feature to fail. This was a result of the Operator failing to updateconfigMap
resource labels. Now, upgrading to the latest version fixes the resource labels. (BZ#2049206) -
Previously, when enforcing the default
configMap
script contents, the wrong data keys were compared. This resulted in theaide-reinit
script not being updated properly after an Operator upgrade, and caused there-init
process to fail. Now,daemonSets
run to completion and the AIDE databasere-init
process executes successfully. (BZ#2072058)
6.2.11. OpenShift File Integrity Operator 0.1.22
The following advisory is available for the OpenShift File Integrity Operator 0.1.22:
6.2.11.1. Bug fixes
-
Previously, a system with a File Integrity Operator installed might interrupt the OpenShift Container Platform update, due to the
/etc/kubernetes/aide.reinit
file. This occurred if the/etc/kubernetes/aide.reinit
file was present, but later removed prior to theostree
validation. With this update,/etc/kubernetes/aide.reinit
is moved to the/run
directory so that it does not conflict with the OpenShift Container Platform update. (BZ#2033311)
6.2.12. OpenShift File Integrity Operator 0.1.21
The following advisory is available for the OpenShift File Integrity Operator 0.1.21:
6.2.12.1. New features and enhancements
-
The metrics related to
FileIntegrity
scan results and processing metrics are displayed on the monitoring dashboard on the web console. The results are labeled with the prefix offile_integrity_operator_
. -
If a node has an integrity failure for more than 1 second, the default
PrometheusRule
provided in the operator namespace alerts with a warning. The following dynamic Machine Config Operator and Cluster Version Operator related filepaths are excluded from the default AIDE policy to help prevent false positives during node updates:
- /etc/machine-config-daemon/currentconfig
- /etc/pki/ca-trust/extracted/java/cacerts
- /etc/cvo/updatepayloads
- /root/.kube
- The AIDE daemon process has stability improvements over v0.1.16, and is more resilient to errors that might occur when the AIDE database is initialized.
6.2.12.2. Bug fixes
- Previously, when the Operator automatically upgraded, outdated daemon sets were not removed. With this release, outdated daemon sets are removed during the automatic upgrade.
6.2.13. Additional resources
6.3. File Integrity Operator support
6.3.1. File Integrity Operator lifecycle
The File Integrity Operator is a "Rolling Stream" Operator, meaning updates are available asynchronously of OpenShift Container Platform releases. For more information, see OpenShift Operator Life Cycles on the Red Hat Customer Portal.
6.3.2. Getting support
If you experience difficulty with a procedure described in this documentation, or with OpenShift Container Platform in general, visit the Red Hat Customer Portal.
From the Customer Portal, you can:
- Search or browse through the Red Hat Knowledgebase of articles and solutions relating to Red Hat products.
- Submit a support case to Red Hat Support.
- Access other product documentation.
To identify issues with your cluster, you can use Insights in OpenShift Cluster Manager. Insights provides details about issues and, if available, information on how to solve a problem.
If you have a suggestion for improving this documentation or have found an error, submit a Jira issue for the most relevant documentation component. Please provide specific details, such as the section name and OpenShift Container Platform version.
6.4. Installing the File Integrity Operator
6.4.1. Installing the File Integrity Operator using the web console
Prerequisites
-
You must have
admin
privileges.
Procedure
-
In the OpenShift Container Platform web console, navigate to Operators
OperatorHub. - Search for the File Integrity Operator, then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator will be installed to the
openshift-file-integrity
namespace. - Click Install.
Verification
To confirm that the installation is successful:
-
Navigate to the Operators
Installed Operators page. -
Check that the Operator is installed in the
openshift-file-integrity
namespace and its status isSucceeded
.
If the Operator is not installed successfully:
-
Navigate to the Operators
Installed Operators page and inspect the Status
column for any errors or failures. -
Navigate to the Workloads
Pods page and check the logs in any pods in the openshift-file-integrity
project that are reporting issues.
6.4.2. Installing the File Integrity Operator using the CLI
Prerequisites
-
You must have
admin
privileges.
Procedure
Create a
Namespace
object YAML file by running:$ oc create -f <file-name>.yaml
Example output
apiVersion: v1 kind: Namespace metadata: labels: openshift.io/cluster-monitoring: "true" pod-security.kubernetes.io/enforce: privileged 1 name: openshift-file-integrity
- 1
- In OpenShift Container Platform 4.15, the pod security label must be set to
privileged
at the namespace level.
Create the
OperatorGroup
object YAML file:$ oc create -f <file-name>.yaml
Example output
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: file-integrity-operator namespace: openshift-file-integrity spec: targetNamespaces: - openshift-file-integrity
Create the
Subscription
object YAML file:$ oc create -f <file-name>.yaml
Example output
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: file-integrity-operator namespace: openshift-file-integrity spec: channel: "stable" installPlanApproval: Automatic name: file-integrity-operator source: redhat-operators sourceNamespace: openshift-marketplace
Verification
Verify the installation succeeded by inspecting the CSV file:
$ oc get csv -n openshift-file-integrity
Verify that the File Integrity Operator is up and running:
$ oc get deploy -n openshift-file-integrity
6.4.3. Additional resources
- The File Integrity Operator is supported in a restricted network environment. For more information, see Using Operator Lifecycle Manager on restricted networks.
6.5. Updating the File Integrity Operator
As a cluster administrator, you can update the File Integrity Operator on your OpenShift Container Platform cluster.
6.5.1. Preparing for an Operator update
The subscription of an installed Operator specifies an update channel that tracks and receives updates for the Operator. You can change the update channel to start tracking and receiving updates from a newer channel.
The names of update channels in a subscription can differ between Operators, but the naming scheme typically follows a common convention within a given Operator. For example, channel names might follow a minor release update stream for the application provided by the Operator (1.2
, 1.3
) or a release frequency (stable
, fast
).
You cannot change installed Operators to a channel that is older than the current channel.
Red Hat Customer Portal Labs include the following application that helps administrators prepare to update their Operators:
You can use the application to search for Operator Lifecycle Manager-based Operators and verify the available Operator version per update channel across different versions of OpenShift Container Platform. Cluster Version Operator-based Operators are not included.
6.5.2. Changing the update channel for an Operator
You can change the update channel for an Operator by using the OpenShift Container Platform web console.
If the approval strategy in the subscription is set to Automatic, the update process initiates as soon as a new Operator version is available in the selected channel. If the approval strategy is set to Manual, you must manually approve pending updates.
Prerequisites
- An Operator previously installed using Operator Lifecycle Manager (OLM).
Procedure
-
In the Administrator perspective of the web console, navigate to Operators
Installed Operators. - Click the name of the Operator you want to change the update channel for.
- Click the Subscription tab.
- Click the name of the update channel under Update channel.
- Click the newer update channel that you want to change to, then click Save.
For subscriptions with an Automatic approval strategy, the update begins automatically. Navigate back to the Operators
Installed Operators page to monitor the progress of the update. When complete, the status changes to Succeeded and Up to date. For subscriptions with a Manual approval strategy, you can manually approve the update from the Subscription tab.
6.5.3. Manually approving a pending Operator update
If an installed Operator has the approval strategy in its subscription set to Manual, when new updates are released in its current update channel, the update must be manually approved before installation can begin.
Prerequisites
- An Operator previously installed using Operator Lifecycle Manager (OLM).
Procedure
-
In the Administrator perspective of the OpenShift Container Platform web console, navigate to Operators
Installed Operators. - Operators that have a pending update display a status with Upgrade available. Click the name of the Operator you want to update.
- Click the Subscription tab. Any updates requiring approval are displayed next to Upgrade status. For example, it might display 1 requires approval.
- Click 1 requires approval, then click Preview Install Plan.
- Review the resources that are listed as available for update. When satisfied, click Approve.
-
Navigate back to the Operators
Installed Operators page to monitor the progress of the update. When complete, the status changes to Succeeded and Up to date.
6.6. Understanding the File Integrity Operator
The File Integrity Operator is an OpenShift Container Platform Operator that continually runs file integrity checks on the cluster nodes. It deploys a daemon set that initializes and runs privileged advanced intrusion detection environment (AIDE) containers on each node, providing a status object with a log of files that are modified during the initial run of the daemon set pods.
Currently, only Red Hat Enterprise Linux CoreOS (RHCOS) nodes are supported.
6.6.1. Creating the FileIntegrity custom resource
An instance of a FileIntegrity
custom resource (CR) represents a set of continuous file integrity scans for one or more nodes.
Each FileIntegrity
CR is backed by a daemon set running AIDE on the nodes matching the FileIntegrity
CR specification.
Procedure
Create the following example
FileIntegrity
CR namedworker-fileintegrity.yaml
to enable scans on worker nodes:Example FileIntegrity CR
apiVersion: fileintegrity.openshift.io/v1alpha1 kind: FileIntegrity metadata: name: worker-fileintegrity namespace: openshift-file-integrity spec: nodeSelector: 1 node-role.kubernetes.io/worker: "" tolerations: 2 - key: "myNode" operator: "Exists" effect: "NoSchedule" config: 3 name: "myconfig" namespace: "openshift-file-integrity" key: "config" gracePeriod: 20 4 maxBackups: 5 5 initialDelay: 60 6 debug: false status: phase: Active 7
- 1
- Defines the selector for scheduling node scans.
- 2
- Specify
tolerations
to schedule on nodes with custom taints. When not specified, a default toleration allowing running on main and infra nodes is applied. - 3
- Define a
ConfigMap
containing an AIDE configuration to use. - 4
- The number of seconds to pause in between AIDE integrity checks. Frequent AIDE checks on a node might be resource intensive, so it can be useful to specify a longer interval. Default is 900 seconds (15 minutes).
- 5
- The maximum number of AIDE database and log backups (leftover from the re-init process) to keep on a node. Older backups beyond this number are automatically pruned by the daemon. Default is set to 5.
- 6
- The number of seconds to wait before starting the first AIDE integrity check. Default is set to 0.
- 7
- The running status of the
FileIntegrity
instance. Statuses areInitializing
,Pending
, orActive
.
Initializing
The
FileIntegrity
object is currently initializing or re-initializing the AIDE database.Pending
The
FileIntegrity
deployment is still being created.Active
The scans are active and ongoing.
Apply the YAML file to the
openshift-file-integrity
namespace:$ oc apply -f worker-fileintegrity.yaml -n openshift-file-integrity
Verification
Confirm the
FileIntegrity
object was created successfully by running the following command:$ oc get fileintegrities -n openshift-file-integrity
Example output
NAME AGE worker-fileintegrity 14s
6.6.2. Checking the FileIntegrity custom resource status
The FileIntegrity
custom resource (CR) reports its status through the .status.phase
subresource.
Procedure
To query the
FileIntegrity
CR status, run:$ oc get fileintegrities/worker-fileintegrity -o jsonpath="{ .status.phase }"
Example output
Active
6.6.3. FileIntegrity custom resource phases
-
Pending
- The phase after the custom resource (CR) is created. -
Active
- The phase when the backing daemon set is up and running. -
Initializing
- The phase when the AIDE database is being reinitialized.
6.6.4. Understanding the FileIntegrityNodeStatuses object
The scan results of the FileIntegrity
CR are reported in another object called FileIntegrityNodeStatuses
.
$ oc get fileintegritynodestatuses
Example output
NAME AGE worker-fileintegrity-ip-10-0-130-192.ec2.internal 101s worker-fileintegrity-ip-10-0-147-133.ec2.internal 109s worker-fileintegrity-ip-10-0-165-160.ec2.internal 102s
It might take some time for the FileIntegrityNodeStatus
object results to be available.
There is one result object per node. The nodeName
attribute of each FileIntegrityNodeStatus
object corresponds to the node being scanned. The status of the file integrity scan is represented in the results
array, which holds scan conditions.
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq
The fileintegritynodestatus
object reports the latest status of an AIDE run and exposes the status as Failed
, Succeeded
, or Errored
in a status
field.
$ oc get fileintegritynodestatuses -w
Example output
NAME NODE STATUS example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal ip-10-0-134-186.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal ip-10-0-150-230.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-169-137.us-east-2.compute.internal ip-10-0-169-137.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal ip-10-0-180-200.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal ip-10-0-194-66.us-east-2.compute.internal Failed example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal ip-10-0-222-188.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal ip-10-0-134-186.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal ip-10-0-222-188.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal ip-10-0-194-66.us-east-2.compute.internal Failed example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal ip-10-0-150-230.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal ip-10-0-180-200.us-east-2.compute.internal Succeeded
6.6.5. FileIntegrityNodeStatus CR status types
These conditions are reported in the results array of the corresponding FileIntegrityNodeStatus
CR status:
-
Succeeded
- The integrity check passed; the files and directories covered by the AIDE check have not been modified since the database was last initialized. -
Failed
- The integrity check failed; some files or directories covered by the AIDE check have been modified since the database was last initialized. -
Errored
- The AIDE scanner encountered an internal error.
6.6.5.1. FileIntegrityNodeStatus CR success example
Example output of a condition with a success status
[ { "condition": "Succeeded", "lastProbeTime": "2020-09-15T12:45:57Z" } ] [ { "condition": "Succeeded", "lastProbeTime": "2020-09-15T12:46:03Z" } ] [ { "condition": "Succeeded", "lastProbeTime": "2020-09-15T12:45:48Z" } ]
In this case, all three scans succeeded and so far there are no other conditions.
6.6.5.2. FileIntegrityNodeStatus CR failure status example
To simulate a failure condition, modify one of the files AIDE tracks. For example, modify /etc/resolv.conf
on one of the worker nodes:
$ oc debug node/ip-10-0-130-192.ec2.internal
Example output
Creating debug namespace/openshift-debug-node-ldfbj ... Starting pod/ip-10-0-130-192ec2internal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.130.192 If you don't see a command prompt, try pressing enter. sh-4.2# echo "# integrity test" >> /host/etc/resolv.conf sh-4.2# exit Removing debug pod ... Removing debug namespace/openshift-debug-node-ldfbj ...
After some time, the Failed
condition is reported in the results array of the corresponding FileIntegrityNodeStatus
object. The previous Succeeded
condition is retained, which allows you to pinpoint the time the check failed.
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io/worker-fileintegrity-ip-10-0-130-192.ec2.internal -ojsonpath='{.results}' | jq -r
Alternatively, if you are not mentioning the object name, run:
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq
Example output
[ { "condition": "Succeeded", "lastProbeTime": "2020-09-15T12:54:14Z" }, { "condition": "Failed", "filesChanged": 1, "lastProbeTime": "2020-09-15T12:57:20Z", "resultConfigMapName": "aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed", "resultConfigMapNamespace": "openshift-file-integrity" } ]
The Failed
condition points to a config map that gives more details about what exactly failed and why:
$ oc describe cm aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
Example output
Name: aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed Namespace: openshift-file-integrity Labels: file-integrity.openshift.io/node=ip-10-0-130-192.ec2.internal file-integrity.openshift.io/owner=worker-fileintegrity file-integrity.openshift.io/result-log= Annotations: file-integrity.openshift.io/files-added: 0 file-integrity.openshift.io/files-changed: 1 file-integrity.openshift.io/files-removed: 0 Data integritylog: ------ AIDE 0.15.1 found differences between database and filesystem!! Start timestamp: 2020-09-15 12:58:15 Summary: Total number of files: 31553 Added files: 0 Removed files: 0 Changed files: 1 --------------------------------------------------- Changed files: --------------------------------------------------- changed: /hostroot/etc/resolv.conf --------------------------------------------------- Detailed information about changes: --------------------------------------------------- File: /hostroot/etc/resolv.conf SHA512 : sTQYpB/AL7FeoGtu/1g7opv6C+KT1CBJ , qAeM+a8yTgHPnIHMaRlS+so61EN8VOpg Events: <none>
Due to the config map data size limit, AIDE logs over 1 MB are added to the failure config map as a base64-encoded gzip archive. In this case, you want to pipe the output of the above command to base64 --decode | gunzip
. Compressed logs are indicated by the presence of a file-integrity.openshift.io/compressed
annotation key in the config map.
6.6.6. Understanding events
Transitions in the status of the FileIntegrity
and FileIntegrityNodeStatus
objects are logged by events. The creation time of the event reflects the latest transition, such as Initializing
to Active
, and not necessarily the latest scan result. However, the newest event always reflects the most recent status.
$ oc get events --field-selector reason=FileIntegrityStatus
Example output
LAST SEEN TYPE REASON OBJECT MESSAGE 97s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Pending 67s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Initializing 37s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Active
When a node scan fails, an event is created with the add/changed/removed
and config map information.
$ oc get events --field-selector reason=NodeIntegrityStatus
Example output
LAST SEEN TYPE REASON OBJECT MESSAGE 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-134-173.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-168-238.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-169-175.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-152-92.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-158-144.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-131-30.ec2.internal 87m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
Changes to the number of added, changed, or removed files results in a new event, even if the status of the node has not transitioned.
$ oc get events --field-selector reason=NodeIntegrityStatus
Example output
LAST SEEN TYPE REASON OBJECT MESSAGE 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-134-173.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-168-238.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-169-175.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-152-92.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-158-144.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-131-30.ec2.internal 87m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed 40m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:3,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
6.7. Configuring the Custom File Integrity Operator
6.7.1. Viewing FileIntegrity object attributes
As with any Kubernetes custom resources (CRs), you can run oc explain fileintegrity
, and then look at the individual attributes using:
$ oc explain fileintegrity.spec
$ oc explain fileintegrity.spec.config
6.7.2. Important attributes
Attribute | Description |
---|---|
|
A map of key-values pairs that must match with node’s labels in order for the AIDE pods to be schedulable on that node. The typical use is to set only a single key-value pair where |
|
A boolean attribute. If set to |
| Specify tolerations to schedule on nodes with custom taints. When not specified, a default toleration is applied, which allows tolerations to run on control plane nodes. |
|
The number of seconds to pause in between AIDE integrity checks. Frequent AIDE checks on a node can be resource intensive, so it can be useful to specify a longer interval. Defaults to |
|
The maximum number of AIDE database and log backups leftover from the |
| Name of a configMap that contains custom AIDE configuration. If omitted, a default configuration is created. |
| Namespace of a configMap that contains custom AIDE configuration. If unset, the FIO generates a default configuration suitable for RHCOS systems. |
|
Key that contains actual AIDE configuration in a config map specified by |
| The number of seconds to wait before starting the first AIDE integrity check. Default is set to 0. This attribute is optional. |
6.7.3. Examine the default configuration
The default File Integrity Operator configuration is stored in a config map with the same name as the FileIntegrity
CR.
Procedure
To examine the default config, run:
$ oc describe cm/worker-fileintegrity
6.7.4. Understanding the default File Integrity Operator configuration
Below is an excerpt from the aide.conf
key of the config map:
@@define DBDIR /hostroot/etc/kubernetes @@define LOGDIR /hostroot/etc/kubernetes database=file:@@{DBDIR}/aide.db.gz database_out=file:@@{DBDIR}/aide.db.gz gzip_dbout=yes verbose=5 report_url=file:@@{LOGDIR}/aide.log report_url=stdout PERMS = p+u+g+acl+selinux+xattrs CONTENT_EX = sha512+ftype+p+u+g+n+acl+selinux+xattrs /hostroot/boot/ CONTENT_EX /hostroot/root/\..* PERMS /hostroot/root/ CONTENT_EX
The default configuration for a FileIntegrity
instance provides coverage for files under the following directories:
-
/root
-
/boot
-
/usr
-
/etc
The following directories are not covered:
-
/var
-
/opt
-
Some OpenShift Container Platform-specific excludes under
/etc/
6.7.5. Supplying a custom AIDE configuration
Any entries that configure AIDE internal behavior such as DBDIR
, LOGDIR
, database
, and database_out
are overwritten by the Operator. The Operator would add a prefix to /hostroot/
before all paths to be watched for integrity changes. This makes reusing existing AIDE configs that might often not be tailored for a containerized environment and start from the root directory easier.
/hostroot
is the directory where the pods running AIDE mount the host’s file system. Changing the configuration triggers a reinitializing of the database.
6.7.6. Defining a custom File Integrity Operator configuration
This example focuses on defining a custom configuration for a scanner that runs on the control plane nodes based on the default configuration provided for the worker-fileintegrity
CR. This workflow might be useful if you are planning to deploy a custom software running as a daemon set and storing its data under /opt/mydaemon
on the control plane nodes.
Procedure
- Make a copy of the default configuration.
- Edit the default configuration with the files that must be watched or excluded.
- Store the edited contents in a new config map.
-
Point the
FileIntegrity
object to the new config map through the attributes inspec.config
. Extract the default configuration:
$ oc extract cm/worker-fileintegrity --keys=aide.conf
This creates a file named
aide.conf
that you can edit. To illustrate how the Operator post-processes the paths, this example adds an exclude directory without the prefix:$ vim aide.conf
Example output
/hostroot/etc/kubernetes/static-pod-resources !/hostroot/etc/kubernetes/aide.* !/hostroot/etc/kubernetes/manifests !/hostroot/etc/docker/certs.d !/hostroot/etc/selinux/targeted !/hostroot/etc/openvswitch/conf.db
Exclude a path specific to control plane nodes:
!/opt/mydaemon/
Store the other content in
/etc
:/hostroot/etc/ CONTENT_EX
Create a config map based on this file:
$ oc create cm master-aide-conf --from-file=aide.conf
Define a
FileIntegrity
CR manifest that references the config map:apiVersion: fileintegrity.openshift.io/v1alpha1 kind: FileIntegrity metadata: name: master-fileintegrity namespace: openshift-file-integrity spec: nodeSelector: node-role.kubernetes.io/master: "" config: name: master-aide-conf namespace: openshift-file-integrity
The Operator processes the provided config map file and stores the result in a config map with the same name as the
FileIntegrity
object:$ oc describe cm/master-fileintegrity | grep /opt/mydaemon
Example output
!/hostroot/opt/mydaemon
6.7.7. Changing the custom File Integrity configuration
To change the File Integrity configuration, never change the generated config map. Instead, change the config map that is linked to the FileIntegrity
object through the spec.name
, namespace
, and key
attributes.
6.8. Performing advanced Custom File Integrity Operator tasks
6.8.1. Reinitializing the database
If the File Integrity Operator detects a change that was planned, it might be required to reinitialize the database.
Procedure
Annotate the
FileIntegrity
custom resource (CR) withfile-integrity.openshift.io/re-init
:$ oc annotate fileintegrities/worker-fileintegrity file-integrity.openshift.io/re-init=
The old database and log files are backed up and a new database is initialized. The old database and logs are retained on the nodes under
/etc/kubernetes
, as seen in the following output from a pod spawned usingoc debug
:Example output
ls -lR /host/etc/kubernetes/aide.* -rw-------. 1 root root 1839782 Sep 17 15:08 /host/etc/kubernetes/aide.db.gz -rw-------. 1 root root 1839783 Sep 17 14:30 /host/etc/kubernetes/aide.db.gz.backup-20200917T15_07_38 -rw-------. 1 root root 73728 Sep 17 15:07 /host/etc/kubernetes/aide.db.gz.backup-20200917T15_07_55 -rw-r--r--. 1 root root 0 Sep 17 15:08 /host/etc/kubernetes/aide.log -rw-------. 1 root root 613 Sep 17 15:07 /host/etc/kubernetes/aide.log.backup-20200917T15_07_38 -rw-r--r--. 1 root root 0 Sep 17 15:07 /host/etc/kubernetes/aide.log.backup-20200917T15_07_55
To provide some permanence of record, the resulting config maps are not owned by the
FileIntegrity
object, so manual cleanup is necessary. As a result, any previous integrity failures would still be visible in theFileIntegrityNodeStatus
object.
6.8.2. Machine config integration
In OpenShift Container Platform 4, the cluster node configuration is delivered through MachineConfig
objects. You can assume that the changes to files that are caused by a MachineConfig
object are expected and should not cause the file integrity scan to fail. To suppress changes to files caused by MachineConfig
object updates, the File Integrity Operator watches the node objects; when a node is being updated, the AIDE scans are suspended for the duration of the update. When the update finishes, the database is reinitialized and the scans resume.
This pause and resume logic only applies to updates through the MachineConfig
API, as they are reflected in the node object annotations.
6.8.3. Exploring the daemon sets
Each FileIntegrity
object represents a scan on a number of nodes. The scan itself is performed by pods managed by a daemon set.
To find the daemon set that represents a FileIntegrity
object, run:
$ oc -n openshift-file-integrity get ds/aide-worker-fileintegrity
To list the pods in that daemon set, run:
$ oc -n openshift-file-integrity get pods -lapp=aide-worker-fileintegrity
To view logs of a single AIDE pod, call oc logs
on one of the pods.
$ oc -n openshift-file-integrity logs pod/aide-worker-fileintegrity-mr8x6
Example output
Starting the AIDE runner daemon initializing AIDE db initialization finished running aide check ...
The config maps created by the AIDE daemon are not retained and are deleted after the File Integrity Operator processes them. However, on failure and error, the contents of these config maps are copied to the config map that the FileIntegrityNodeStatus
object points to.
6.9. Troubleshooting the File Integrity Operator
6.9.1. General troubleshooting
- Issue
- You want to generally troubleshoot issues with the File Integrity Operator.
- Resolution
-
Enable the debug flag in the
FileIntegrity
object. Thedebug
flag increases the verbosity of the daemons that run in theDaemonSet
pods and run the AIDE checks.
6.9.2. Checking the AIDE configuration
- Issue
- You want to check the AIDE configuration.
- Resolution
-
The AIDE configuration is stored in a config map with the same name as the
FileIntegrity
object. All AIDE configuration config maps are labeled withfile-integrity.openshift.io/aide-conf
.
6.9.3. Determining the FileIntegrity object’s phase
- Issue
-
You want to determine if the
FileIntegrity
object exists and see its current status. - Resolution
To see the
FileIntegrity
object’s current status, run:$ oc get fileintegrities/worker-fileintegrity -o jsonpath="{ .status }"
Once the
FileIntegrity
object and the backing daemon set are created, the status should switch toActive
. If it does not, check the Operator pod logs.
6.9.4. Determining that the daemon set’s pods are running on the expected nodes
- Issue
- You want to confirm that the daemon set exists and that its pods are running on the nodes you expect them to run on.
- Resolution
Run:
$ oc -n openshift-file-integrity get pods -lapp=aide-worker-fileintegrity
NoteAdding
-owide
includes the IP address of the node that the pod is running on.To check the logs of the daemon pods, run
oc logs
.Check the return value of the AIDE command to see if the check passed or failed.