6.4. 了解 File Integrity Operator
File Integrity Operator 是一个 OpenShift Container Platform Operator,其在集群节点上持续运行文件完整性检查。它部署一个守护进程集,在每个节点上初始化并运行特权高级入侵检测环境 (AIDE) 容器,从而提供一个状态对象和在守护进程集初始运行时修改的文件日志。
目前,仅支持 Red Hat Enterprise Linux CoreOS (RHCOS) 节点。
6.4.1. 创建 FileIntegrity 自定义资源
FileIntegrity
自定义资源 (CR) 的实例代表一组对一个或多个节点进行的持续文件完整性扫描。
每个 FileIntegrity
CR 都由一个在符合 FileIntegrity
CR 规格的节点上运行 AIDE 的守护进程集支持。
流程
创建名为
worker-fileintegrity.yaml
的FileIntegrity
CR 示例,以便在 worker 节点上启用扫描:示例 FileIntegrity CR
apiVersion: fileintegrity.openshift.io/v1alpha1 kind: FileIntegrity metadata: name: worker-fileintegrity namespace: openshift-file-integrity spec: nodeSelector: 1 node-role.kubernetes.io/worker: "" tolerations: 2 - key: "myNode" operator: "Exists" effect: "NoSchedule" config: 3 name: "myconfig" namespace: "openshift-file-integrity" key: "config" gracePeriod: 20 4 maxBackups: 5 5 initialDelay: 60 6 debug: false status: phase: Active 7
- 1
- 定义调度节点扫描的选择器。
- 2
- 在带有自定义污点的节点上指定调度
容限
。如果没有指定,则应用允许在主节点上运行的默认容限。 - 3
- 定义包含要使用的 AIDE 配置的
ConfigMap
。 - 4
- AIDE 完整性检查之间暂停的秒数。在节点中频繁进行 AIDE 检查需要大量资源,因此可以指定较长的时间间隔。默认值为 900 秒 (15 分钟)。
- 5
- 从 re-init 进程保留的最大 AIDE 数据库和日志备份数,以保留在节点上。守护进程会自动修剪超出这个数字的旧备份。默认设为 5。
- 6
- 启动第一个 AIDE 完整性检查前等待的秒数。默认设为 0。
- 7
FileIntegrity
实例的运行状态。状态为Initializing
、Pending
或Active
。
Initializing
FileIntegrity
对象目前正在初始化或重新初始化 AIDE 数据库。待处理
FileIntegrity
部署仍然被创建。Active
扫描处于活动状态且持续。
将 YAML 文件应用到
openshift-file-integrity
命名空间:$ oc apply -f worker-fileintegrity.yaml -n openshift-file-integrity
验证
运行以下命令确认
FileIntegrity
对象已创建成功:$ oc get fileintegrities -n openshift-file-integrity
输出示例
NAME AGE worker-fileintegrity 14s
6.4.2. 检查 FileIntegrity 自定义资源状态
FileIntegrity
自定义资源 (CR) 通过 .status.phase
子资源报告其状态。
流程
要查询
FileIntegrity
CR 状态,请运行:$ oc get fileintegrities/worker-fileintegrity -o jsonpath="{ .status.phase }"
输出示例
Active
6.4.3. FileIntegrity 自定义资源阶段
-
待处理
- 创建自定义资源 (CR) 后的阶段。 -
Active
- 后备守护进程集启动并运行的阶段。 -
初始化
- AIDE 数据库重新初始化的阶段。
6.4.4. 了解 FileIntegrityNodeStatuses 对象
FileIntegrity
CR 的扫描结果会在名为 FileIntegrityNodeStatuses
的另一个对象中报告。
$ oc get fileintegritynodestatuses
输出示例
NAME AGE worker-fileintegrity-ip-10-0-130-192.ec2.internal 101s worker-fileintegrity-ip-10-0-147-133.ec2.internal 109s worker-fileintegrity-ip-10-0-165-160.ec2.internal 102s
可能需要经过一段时间 FileIntegrityNodeStatus
对象才会可用。
每个节点都有一个结果对象。每个 FileIntegrityNodeStatus
对象的 nodeName
属性都对应于被扫描的节点。文件完整性扫描的状态在 results
数组中表示,其中包含扫描条件。
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq
fileintegritynodestatus
对象报告 AIDE 运行的最新状态,并在 status
字段中公开状态为 Failed
、Succeeded
或 Errored
。
$ oc get fileintegritynodestatuses -w
输出示例
NAME NODE STATUS example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal ip-10-0-134-186.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal ip-10-0-150-230.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-169-137.us-east-2.compute.internal ip-10-0-169-137.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal ip-10-0-180-200.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal ip-10-0-194-66.us-east-2.compute.internal Failed example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal ip-10-0-222-188.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-134-186.us-east-2.compute.internal ip-10-0-134-186.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-222-188.us-east-2.compute.internal ip-10-0-222-188.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-194-66.us-east-2.compute.internal ip-10-0-194-66.us-east-2.compute.internal Failed example-fileintegrity-ip-10-0-150-230.us-east-2.compute.internal ip-10-0-150-230.us-east-2.compute.internal Succeeded example-fileintegrity-ip-10-0-180-200.us-east-2.compute.internal ip-10-0-180-200.us-east-2.compute.internal Succeeded
6.4.5. FileIntegrityNodeStatus CR 状态类型
这些条件会在对应的 FileIntegrityNodeStatus
CR 状态的结果数组中报告:
-
Succeeded
- 通过完整性检查;AIDE 检查中所涵盖的文件和目录自数据库上次初始化以来没有被修改。 -
Failed
- 完整性检查失败;AIDE 检查中所涵盖的一些文件和目录自数据库上次初始化以来已被修改。 -
Errored
- AIDE 扫描程序遇到内部错误。
6.4.5.1. FileIntegrityNodeStatus CR 成功示例
成功状态条件的输出示例
[ { "condition": "Succeeded", "lastProbeTime": "2020-09-15T12:45:57Z" } ] [ { "condition": "Succeeded", "lastProbeTime": "2020-09-15T12:46:03Z" } ] [ { "condition": "Succeeded", "lastProbeTime": "2020-09-15T12:45:48Z" } ]
在这种情况下,所有三个扫描都成功,目前没有其他条件。
6.4.5.2. FileIntegrityNodeStatus CR 失败状态示例
要模拟失败条件,请修改 AIDE 跟踪的其中一个文件。例如,在其中一个 worker 节点上修改 /etc/resolv.conf
:
$ oc debug node/ip-10-0-130-192.ec2.internal
输出示例
Creating debug namespace/openshift-debug-node-ldfbj ... Starting pod/ip-10-0-130-192ec2internal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.130.192 If you don't see a command prompt, try pressing enter. sh-4.2# echo "# integrity test" >> /host/etc/resolv.conf sh-4.2# exit Removing debug pod ... Removing debug namespace/openshift-debug-node-ldfbj ...
一段时间后,相应的 FileIntegrityNodeStatus
对象的结果数组中会报告 Failed
条件。之前的 Succeeded
条件被保留,便于您查明检查失败的时间。
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io/worker-fileintegrity-ip-10-0-130-192.ec2.internal -ojsonpath='{.results}' | jq -r
或者,如果您没有提到对象名称,则运行:
$ oc get fileintegritynodestatuses.fileintegrity.openshift.io -ojsonpath='{.items[*].results}' | jq
输出示例
[ { "condition": "Succeeded", "lastProbeTime": "2020-09-15T12:54:14Z" }, { "condition": "Failed", "filesChanged": 1, "lastProbeTime": "2020-09-15T12:57:20Z", "resultConfigMapName": "aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed", "resultConfigMapNamespace": "openshift-file-integrity" } ]
Failed
条件指向一个配置映射,该映射详细介绍了具体的失败及失败原因:
$ oc describe cm aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed
输出示例
Name: aide-ds-worker-fileintegrity-ip-10-0-130-192.ec2.internal-failed Namespace: openshift-file-integrity Labels: file-integrity.openshift.io/node=ip-10-0-130-192.ec2.internal file-integrity.openshift.io/owner=worker-fileintegrity file-integrity.openshift.io/result-log= Annotations: file-integrity.openshift.io/files-added: 0 file-integrity.openshift.io/files-changed: 1 file-integrity.openshift.io/files-removed: 0 Data integritylog: ------ AIDE 0.15.1 found differences between database and filesystem!! Start timestamp: 2020-09-15 12:58:15 Summary: Total number of files: 31553 Added files: 0 Removed files: 0 Changed files: 1 --------------------------------------------------- Changed files: --------------------------------------------------- changed: /hostroot/etc/resolv.conf --------------------------------------------------- Detailed information about changes: --------------------------------------------------- File: /hostroot/etc/resolv.conf SHA512 : sTQYpB/AL7FeoGtu/1g7opv6C+KT1CBJ , qAeM+a8yTgHPnIHMaRlS+so61EN8VOpg Events: <none>
由于配置映射数据大小限制,超过 1MB 的 AIDE 日志会作为 base64 编码的 gzip 存档添加到失败配置映射中。在这种情况下,您要将以上命令的输出输出管道到 base64 --decode | gunzip
。压缩的日志由配置映射中存在 file-integrity.openshift.io/compressed
注解键来表示。
6.4.6. 了解事件
FileIntegrity
和 FileIntegrityNodeStatus
对象的状态由 事件 记录。事件的创建时间反映了最新变化,如从 Initializing
变为 Active
,它不一定是最新的扫描结果。但是,最新的事件始终反映最新的状态。
$ oc get events --field-selector reason=FileIntegrityStatus
输出示例
LAST SEEN TYPE REASON OBJECT MESSAGE 97s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Pending 67s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Initializing 37s Normal FileIntegrityStatus fileintegrity/example-fileintegrity Active
当节点扫描失败时,会使用 add/changed/removed
和配置映射信息创建事件。
$ oc get events --field-selector reason=NodeIntegrityStatus
输出示例
LAST SEEN TYPE REASON OBJECT MESSAGE 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-134-173.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-168-238.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-169-175.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-152-92.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-158-144.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-131-30.ec2.internal 87m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed
更改添加、更改或删除的文件数量会导致新的事件,即使节点的状态尚未转换。
$ oc get events --field-selector reason=NodeIntegrityStatus
输出示例
LAST SEEN TYPE REASON OBJECT MESSAGE 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-134-173.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-168-238.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-169-175.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-152-92.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-158-144.ec2.internal 114m Normal NodeIntegrityStatus fileintegrity/example-fileintegrity no changes to node ip-10-0-131-30.ec2.internal 87m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:1,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed 40m Warning NodeIntegrityStatus fileintegrity/example-fileintegrity node ip-10-0-152-92.ec2.internal has changed! a:3,c:1,r:0 \ log:openshift-file-integrity/aide-ds-example-fileintegrity-ip-10-0-152-92.ec2.internal-failed