Troubleshooting Collector
Troubleshooting Collector
概要
第1章 Retrieving and analyzing the Collector logs and pod status リンクのコピーリンクがクリップボードにコピーされました!
The first step in troubleshooting is to retrieve the logs and pods status. The logs allow you to identify the root cause of an error. In addition, examining the pod’s most recent status can provide information about failure messages.
1.1. Retrieving the Collector logs リンクのコピーリンクがクリップボードにコピーされました!
First, you should examine the logs from failing Collectors. Depending on your environment and access rights, you can obtain these logs in two ways:
1.1.1. Retrieving the logs with the oc or kubectl command リンクのコピーリンクがクリップボードにコピーされました!
You can use either the oc or kubectl command to obtain logs from your running Collector pod. Optionally, you can even check the logs from a previous Collector pod if your current Collector pod is restarting.
If you use Kubernetes, enter kubectl instead of oc.
Prerequisites
Ensure that you have the authority to list the pods and logs:
$ oc auth can-i get pods && oc auth can-i get pods --subresource=logs
Procedure
List all the pods with label
app=collector:$ oc get pods -n stackrox -l app=collectorExample output
collector-vclg5 1/2 CrashLoopBackOff 2 (25s ago) 2m41s+Get the logs for the Collector pod:
$ oc logs -n stackrox <collector_pod_name> collectorwhere:
<collector_pod_name>-
Specifies the name of your Collector pod, for example,
collector-vclg5.
(Optional) If the current Collector pod is restarting, you can check the logs for the previous Collector pod:
$ oc logs -n stackrox <collector_pod_name> collector --previouswhere:
<collector_pod_name>-
Specifies the name of your Collector pod, for example,
collector-vclg5.
1.1.2. Retrieving logs from a RHACS diagnostic bundle リンクのコピーリンクがクリップボードにコピーされました!
You can also access Collector logs by downloading a diagnostic bundle from the Red Hat Advanced Cluster Security for Kubernetes (RHACS) user interface. Once you have downloaded the diagnostic bundle, you can inspect the logs for all the Collector pods. For more information, see Generating a diagnostic bundle.
1.2. Analyzing the Collector pod status リンクのコピーリンクがクリップボードにコピーされました!
Examining the pod’s most recent status is another easy way to determine the cause of a Collector crash. Failure messages are recorded to the most recent status and are accessible using the kubectl describe pod or oc describe pod command.
If you use Kubernetes, enter kubectl instead of oc.
Procedure
Describe the Collector pod:
$ oc describe pod -n stackrox <collector_pod_name>where:
<collector_pod_name>-
Specifies the name of your Collector pod, for example,
collector-vclg5.
Example output
# ... Last State: Terminated Reason: Error Message: No suitable kernel object downloaded Exit Code: 1 Started: Fri, 21 Oct 2022 11:50:56 +0100 Finished: Fri, 21 Oct 2022 11:51:25 +0100 # ...In this example, you can see that Collector has failed to download a kernel driver.
第2章 Commonly occurring error conditions リンクのコピーリンクがクリップボードにコピーされました!
Most errors occur during Collector startup when Collector configures itself and loads eBPF probe into the system.
Collector startup process involves the following stages:
- Parse Configuration
- Analyze Host
- Connecting to Sensor
- Loading eBPF probe
Failure at any step is considered fatal. If any part of the startup procedure fails, the logs display a diagnostic summary with details about which steps succeeded or failed.
The following log file example shows a successful startup:
[INFO 2025/07/24 10:05:54] == Collector Startup Diagnostics: ==
[INFO 2025/07/24 10:05:54] Connected to Sensor? false
[INFO 2025/07/24 10:05:54] Kernel driver candidates:
[INFO 2025/07/24 10:05:54] core_bpf (available)
[INFO 2025/07/24 10:05:54] Driver loaded into kernel: core_bpf
[INFO 2025/07/24 10:05:54] ====================================
The log output confirms that Collector connected to Sensor and loaded the eBPF probe. You can use this log to check for the successful startup of Collector.
2.1. Unable to connect to the Sensor リンクのコピーリンクがクリップボードにコピーされました!
When starting, first check if you can connect to Sensor. Sensor is responsible for downloading kernel drivers and CIDR blocks for processing network events, making it an essential part of the startup process. The following logs indicate you are unable to connect to the Sensor:
Collector Version: 3.15.0
OS: Ubuntu 20.04.4 LTS
Kernel Version: 5.4.0-126-generic
[...]
[INFO 2023/05/13 12:20:43] Sensor configured at address: sensor.stackrox.svc:9998
[INFO 2023/05/13 12:20:43] Attempting to connect to Sensor
[INFO 2023/05/13 12:21:13]
[INFO 2023/05/13 12:21:13] == Collector Startup Diagnostics: ==
[INFO 2023/05/13 12:21:13] Connected to Sensor? false
[INFO 2023/05/13 12:21:13] Kernel driver candidates:
[INFO 2023/05/13 12:21:13] ====================================
[INFO 2023/05/13 12:21:13]
[FATAL 2023/05/13 12:21:13] Unable to connect to Sensor at 'sensor.stackrox.svc:9998'.
This error could mean that Sensor has not started correctly or that Collector configuration is incorrect. To fix this issue, you must verify Collector configuration to ensure that Sensor address is correct and that the Sensor pod is running correctly.
View the Collector logs to specifically check the configured Sensor address. Alternatively, you can run the following command:
$ kubectl -n stackrox get pod <collector_pod_name> -o jsonpath='{.spec.containers[0].env[?(@.name=="GRPC_SERVER")].value}'
where:
<collector_pod_name>-
Specifies the name of your Collector pod, for example,
collector-vclg5.
2.2. Failing to load the eBPF probe リンクのコピーリンクがクリップボードにコピーされました!
Before Collector starts, it loads the eBPF probe; however, in rare cases, you might encounter issues where Collector cannot load the eBPF probe, which results in various error messages or exceptions. In such cases, you must check the logs to identify the problems with failure in loading the eBPF probe.
Consider the following Collector log:
[...]
[INFO 2025/07/24 10:26:37] Trying to open the right engine!
[INFO 2025/07/24 10:26:41] libbpf: prog 'execve_x': -- BEGIN PROG LOAD LOG --
[...]
-- END PROG LOAD LOG --
[INFO 2025/07/24 10:26:41] libbpf: prog 'execve_x': failed to load: -7
[INFO 2025/07/24 10:26:41] libbpf: failed to load object 'bpf_probe'
[INFO 2025/07/24 10:26:41] libbpf: failed to load BPF skeleton 'bpf_probe': -7
[INFO 2025/07/24 10:26:41] libpman: failed to load BPF object (errno: 7 | message: Argument list too long)
If you encounter this kind of error, you must report it to Red Hat Advanced Cluster Security for Kubernetes (RHACS) support team or create an issue in the stackrox/collector GitHub repository.