Troubleshooting Collector


Red Hat Advanced Cluster Security for Kubernetes 4.0

Troubleshooting Collector

Red Hat OpenShift Documentation Team

Abstract

Use this guide to learn how to retrieve logs and debug issues with a failing Collector.

Chapter 1. Retrieving and analyzing the Collector logs and pod status

The first step in troubleshooting is to retrieve the logs and pods status. The logs allow you to identify the root cause of an error. In addition, examining the pod’s most recent status can provide information about failure messages.

1.1. Retrieving the Collector logs

First, you should examine the logs from failing Collectors. Depending on your environment and access rights, you can obtain these logs in two ways:

1.1.1. Retrieving the logs with the oc or kubectl command

You can use either the oc or kubectl command to obtain logs from your running Collector pod. Optionally, you can even check the logs from a previous Collector pod if your current Collector pod is restarting.

Prerequisites

  • Ensure that you have the authority to list the pods and logs:

    $ oc auth can-i get pods && oc auth can-i get pods --subresource=logs 1
    1
    If you use Kubernetes, enter kubectl instead of oc.

Procedure

  1. List all the pods with label app=collector:

    $ oc get pods -n stackrox -l app=collector 1
    1
    If you use Kubernetes, enter kubectl instead of oc.

    Example output

    collector-vclg5    1/2     CrashLoopBackOff   2 (25s ago)   2m41s+

  2. Get the logs for the Collector pod:

    $ oc logs -n stackrox <collector_pod_name> collector 1
    1
    If you use Kubernetes, enter kubectl instead of oc. For <collector_pod_name>, specify the name of your Collector pod, for example, collector-vclg5.
  3. (Optional) If the current Collector pod is restarting, you can check the logs for the previous Collector pod:

    $ oc logs -n stackrox <collector_pod_name> collector --previous 1
    1
    If you use Kubernetes, enter kubectl instead of oc. For <collector_pod_name>, specify the name of your Collector pod, for example, collector-vclg5.

1.1.2. Retrieving logs from a RHACS diagnostic bundle

You can also access Collector logs by downloading a diagnostic bundle from the Red Hat Advanced Cluster Security for Kubernetes (RHACS) user interface. Once you have downloaded the diagnostic bundle, you can inspect the logs for all the Collector pods. For more information, see Generating a diagnostic bundle.

1.2. Analyzing the Collector pod status

Examining the pod’s most recent status is another easy way to determine the cause of a Collector crash. Failure messages are recorded to the most recent status and are accessible using the kubectl describe pod or oc describe pod command.

Procedure

  • Describe the Collector pod:

    $ oc describe pod -n stackrox <collector_pod_name> 1
    1
    If you use Kubernetes, enter kubectl instead of oc. For <collector_pod_name>, specify the name of your Collector pod, for example, collector-vclg5.

    Example output

    [...]
        Last State:     Terminated
          Reason:       Error
          Message:      No suitable kernel object downloaded 1
          Exit Code:    1
          Started:      Fri, 21 Oct 2022 11:50:56 +0100
          Finished:     Fri, 21 Oct 2022 11:51:25 +0100
    [...]

    1
    In this example, you can see that Collector has failed to download a kernel driver.

Chapter 2. Commonly occurring error conditions

Most errors occur during Collector startup when Collector configures itself and finds or downloads a kernel driver for the system.

The following diagram describes the main parts of Collector startup process:

Figure 2.1. Collector pod startup process

Collector pod startup process

If any part of the startup procedure fails, the logs display a diagnostic summary detailing which steps succeeded or failed .

The following log file example shows a successful startup:

[INFO    2022/11/28 13:21:55] == Collector Startup Diagnostics: ==
[INFO    2022/11/28 13:21:55]  Connected to Sensor?       true
[INFO    2022/11/28 13:21:55]  Kernel driver available?   true
[INFO    2022/11/28 13:21:55]  Driver loaded into kernel? true
[INFO    2022/11/28 13:21:55] ====================================

The log output confirms that Collector connected to Sensor and located and loaded the kernel driver. You can use this log to check for the successful startup of Collector.

2.1. Unable to connect to the Sensor

When starting, first check if you can connect to Sensor. Sensor is responsible for downloading kernel drivers and CIDR blocks for processing network events, making it an essential part of the startup process. The following logs indicate you are unable to connect to the Sensor:

Collector Version: 3.12.0
OS: Ubuntu 20.04.4 LTS
Kernel Version: 5.4.0-126-generic
Starting StackRox Collector...
[INFO    2022/10/13 12:20:43] Hostname: 'hostname'
[...]
[INFO    2022/10/13 12:20:43] Sensor configured at address: sensor.stackrox.svc:9998
[INFO    2022/10/13 12:20:43] Attempting to connect to Sensor
[INFO    2022/10/13 12:21:13]
[INFO    2022/10/13 12:21:13] == Collector Startup Diagnostics: ==
[INFO    2022/10/13 12:21:13]  Connected to Sensor?       false
[INFO    2022/10/13 12:21:13]  Kernel driver available?   false
[INFO    2022/10/13 12:21:13]  Driver loaded into kernel? false
[INFO    2022/10/13 12:21:13] ====================================
[INFO    2022/10/13 12:21:13]
[FATAL   2022/10/13 12:21:13] Unable to connect to Sensor.

This error could mean that Sensor has not started correctly or that Collector configuration is incorrect. To fix this issue, you must verify Collector configuration to ensure that Sensor address is correct and that the Sensor pod is running correctly.

View the Collector logs to specifically check the configured Sensor address. Alternatively, you can run the following command:

$ kubectl -n stackrox get pod <collector_pod_name> -o jsonpath='{.spec.containers[0].env[?(@.name=="GRPC_SERVER")].value}' 1
1
For <collector_pod_name>, specify the name of your Collector pod, for example, collector-vclg5.

2.2. Unavailability of the kernel driver

Collector determines if it has a kernel driver for the node’s kernel version. Collector first searches the local storage for a driver with the correct version and type, and then attempts to download the driver from Sensor. The following logs indicate that neither a local kernel driver nor a driver from Sensor is present:

Collector Version: 3.12.0
OS: Alpine Linux v3.14
Kernel Version: 5.10.109-0-virt
Starting StackRox Collector...
[INFO    2022/10/13 13:32:57] Hostname: 'alpine'
[...]
[INFO    2022/10/13 13:32:57] Sensor configured at address: sensor.stackrox.svc:9999
[INFO    2022/10/13 13:32:57] Attempting to connect to Sensor
[INFO    2022/10/13 13:32:57] Successfully connected to Sensor.
[INFO    2022/10/13 13:32:57] Module version: 2.2.0
[INFO    2022/10/13 13:32:57] Attempting to find kernel module - Candidate kernel versions:
[INFO    2022/10/13 13:32:57] 5.10.109-0-virt
[INFO    2022/10/13 13:32:57] Local storage does not contain collector-5.10.109-0-virt.ko
[...]
[INFO    2022/10/13 13:32:57] Attempting to download kernel object from https://sensor.stackrox.svc/kernel-objects/2.2.0/collector-5.10.109-0-virt.ko.gz 1
[WARNING 2022/10/13 13:32:58] [Throttled] Unexpected HTTP request failure (HTTP 404) 2
[WARNING 2022/10/13 13:33:08] [Throttled] Unexpected HTTP request failure (HTTP 404)
[WARNING 2022/10/13 13:33:18] [Throttled] Unexpected HTTP request failure (HTTP 404)
[WARNING 2022/10/13 13:33:29] [Throttled] Unexpected HTTP request failure (HTTP 404)
[WARNING 2022/10/13 13:33:35] Attempted to download collector-5.10.109-0-virt.ko.gz 30 time(s)
[WARNING 2022/10/13 13:33:35] Failed to download from collector-5.10.109-0-virt.ko.gz
[WARNING 2022/10/13 13:33:35] Unable to download kernel object collector-5.10.109-0-virt.ko to /module/collector.ko.gz
[ERROR   2022/10/13 13:33:35] Error getting kernel object: collector-5.10.109-0-virt.ko
[INFO    2022/10/13 13:33:35]
[INFO    2022/10/13 13:33:35] == Collector Startup Diagnostics: ==
[INFO    2022/10/13 13:33:35]  Connected to Sensor?       true
[INFO    2022/10/13 13:33:35]  Kernel driver available?   false
[INFO    2022/10/13 13:33:35]  Driver loaded into kernel? false
[INFO    2022/10/13 13:33:35] ====================================
[INFO    2022/10/13 13:33:35]
[FATAL   2022/10/13 13:33:35]  No suitable kernel object downloaded for kernel 5.10.109-0-virt 3
1
The logs display attempts to locate the module first, followed by any efforts to download the driver from Sensor.
2
The 404 errors indicate that the node’s kernel does not have a kernel driver.
3
As a result of missing a driver, Collector enters the CrashLoopBackOff state.

The Kernel versions file contains a list of all supported kernel versions.

2.3. Failing to load the kernel driver

Before Collector starts, it loads the kernel driver. However, in rare cases, you might encounter issues where Collector cannot load the kernel driver, resulting in various error messages or exceptions. In such cases, you must check the logs to identify the problems with failure in loading the kernel driver.

Consider the following Collector log:

[INFO    2022/10/13 14:25:13] Hostname: 'hostname'
[...]
[INFO    2022/10/13 14:25:13] Successfully downloaded and decompressed /module/collector.ko
[INFO    2022/10/13 14:25:13]
[INFO    2022/10/13 14:25:13] This product uses kernel module and ebpf subcomponents licensed under the GNU
[INFO    2022/10/13 14:25:13] GENERAL PURPOSE LICENSE Version 2 outlined in the /kernel-modules/LICENSE file.
[INFO    2022/10/13 14:25:13] Source code for the kernel module and ebpf subcomponents is available upon
[INFO    2022/10/13 14:25:13] request by contacting support@stackrox.com.
[INFO    2022/10/13 14:25:13]
[...]
[INFO    2022/10/13 14:25:13] Inserting kernel module /module/collector.ko with indefinite removal and retry if required.
[ERROR   2022/10/13 14:25:13] Error inserting kernel module: /module/collector.ko: Operation not permitted. Aborting...
[ERROR   2022/10/13 14:25:13] Failed to insert kernel module
[ERROR   2022/10/13 14:25:13] Failed to setup Kernel module
[INFO    2022/10/13 14:25:13]
[INFO    2022/10/13 14:25:13] == Collector Startup Diagnostics: ==
[INFO    2022/10/13 14:25:13]  Connected to Sensor?       true
[INFO    2022/10/13 14:25:13]  Kernel driver available?   true
[INFO    2022/10/13 14:25:13]  Driver loaded into kernel? false
[INFO    2022/10/13 14:25:13] ====================================
[INFO    2022/10/13 14:25:13]
[FATAL   2022/10/13 14:25:13] Failed to initialize collector kernel components.

If you encounter this kind of error, it is unlikely that you can fix it yourself. So instead, report it to Red Hat Advanced Cluster Security for Kubernetes (RHACS) support team or create a GitHub issue.

Legal Notice

Copyright © 2024 Red Hat, Inc.
The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust. Explore our recent updates.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.