Este conteúdo não está disponível no idioma selecionado.
Support
Red Hat OpenShift Service on AWS Support.
Abstract
Chapter 1. Support overview Copiar o linkLink copiado para a área de transferência!
Red Hat offers cluster administrators tools for gathering data for your cluster, monitoring, and troubleshooting.
1.1. Get support Copiar o linkLink copiado para a área de transferência!
Get support: Visit the Red Hat Customer Portal to review knowledge base articles, submit a support case, and review additional product documentation and resources.
1.2. Remote health monitoring issues Copiar o linkLink copiado para a área de transferência!
Remote health monitoring issues: Red Hat OpenShift Service on AWS collects telemetry and configuration data about your cluster and reports it to Red Hat by using the Telemeter Client and the Insights Operator. Red Hat uses this data to understand and resolve issues in a connected cluster. Red Hat OpenShift Service on AWS collects data and monitors health using the following:
Telemetry: The Telemetry Client gathers and uploads the metrics values to Red Hat every four minutes and thirty seconds. Red Hat uses this data to:
- Monitor the clusters.
- Roll out Red Hat OpenShift Service on AWS upgrades.
- Improve the upgrade experience.
Insights Operator: By default, Red Hat OpenShift Service on AWS installs and enables the Insights Operator, which reports configuration and component failure status every two hours. The Insights Operator helps to:
- Identify potential cluster issues proactively.
- Provide a solution and preventive action in Red Hat OpenShift Cluster Manager.
You can review telemetry information.
1.3. Gather data about your cluster Copiar o linkLink copiado para a área de transferência!
Gather data about your cluster: Red Hat recommends gathering your debugging information when opening a support case. This helps Red Hat Support to perform a root cause analysis. A cluster administrator can use the following to gather data about your cluster:
-
must-gather tool: Use the
must-gathertool to collect information about your cluster and to debug the issues. -
sosreport: Use the
sosreporttool to collect configuration details, system information, and diagnostic data for debugging purposes. - Cluster ID: Obtain the unique identifier for your cluster, when providing information to Red Hat Support.
-
Cluster node journal logs: Gather
journaldunit logs and logs within/var/logon individual cluster nodes to troubleshoot node-related issues. - Network trace: Provide a network packet trace from a specific Red Hat OpenShift Service on AWS cluster node or a container to Red Hat Support to help troubleshoot network-related issues.
1.4. Troubleshooting issues Copiar o linkLink copiado para a área de transferência!
A cluster administrator can monitor and troubleshoot the following Red Hat OpenShift Service on AWS component issues:
Node issues: A cluster administrator can verify and troubleshoot node-related issues by reviewing the status, resource usage, and configuration of a node. You can query the following:
- Kubelet’s status on a node.
- Cluster node journal logs.
Operator issues: A cluster administrator can do the following to resolve Operator issues:
- Verify Operator subscription status.
- Check Operator pod health.
- Gather Operator logs.
Pod issues: A cluster administrator can troubleshoot pod-related issues by reviewing the status of a pod and completing the following:
- Review pod and container logs.
- Start debug pods with root access.
Source-to-image issues: A cluster administrator can observe the S2I stages to determine where in the S2I process a failure occurred. Gather the following to resolve Source-to-Image (S2I) issues:
- Source-to-Image diagnostic data.
- Application diagnostic data to investigate application failure.
Storage issues: A multi-attach storage error occurs when the mounting volume on a new node is not possible because the failed node cannot unmount the attached volume. A cluster administrator can do the following to resolve multi-attach storage issues:
- Enable multiple attachments by using RWX volumes.
- Recover or delete the failed node when using an RWO volume.
Monitoring issues: A cluster administrator can follow the procedures on the troubleshooting page for monitoring. If the metrics for your user-defined projects are unavailable or if Prometheus is consuming a lot of disk space, check the following:
- Investigate why user-defined metrics are unavailable.
- Determine why Prometheus is consuming a lot of disk space.
-
OpenShift CLI (
oc) issues: Investigate OpenShift CLI (oc) issues by increasing the log level.
Chapter 2. Managing your cluster resources Copiar o linkLink copiado para a área de transferência!
You can apply global configuration options in Red Hat OpenShift Service on AWS. Operators apply these configuration settings across the cluster.
2.1. Interacting with your cluster resources Copiar o linkLink copiado para a área de transferência!
You can interact with cluster resources by using the OpenShift CLI (oc) tool in Red Hat OpenShift Service on AWS. The cluster resources that you see after running the oc api-resources command can be edited.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. -
You have access to the web console or you have installed the
ocCLI tool.
Procedure
To see which configuration Operators have been applied, run the following command:
oc api-resources -o name | grep config.openshift.io
$ oc api-resources -o name | grep config.openshift.ioCopy to Clipboard Copied! Toggle word wrap Toggle overflow To see what cluster resources you can configure, run the following command:
oc explain <resource_name>.config.openshift.io
$ oc explain <resource_name>.config.openshift.ioCopy to Clipboard Copied! Toggle word wrap Toggle overflow To see the configuration of custom resource definition (CRD) objects in the cluster, run the following command:
oc get <resource_name>.config -o yaml
$ oc get <resource_name>.config -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow To edit the cluster resource configuration, run the following command:
oc edit <resource_name>.config -o yaml
$ oc edit <resource_name>.config -o yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 3. Approved Access Copiar o linkLink copiado para a área de transferência!
Red Hat Site Reliability Engineering (SRE) typically does not require elevated access to systems as part of normal operations to manage and support Red Hat OpenShift Service on AWS clusters. Elevated access gives SRE the access levels of a cluster-admin role. See cluster roles for more information.
In the unlikely event that SRE needs elevated access to systems, you can use the Approved Access interface to review and approve or deny access to these systems.
Elevated access requests to clusters on Red Hat OpenShift Service on AWS clusters and the corresponding cloud accounts can be created by SRE either in response to a customer-initiated support ticket or in response to alerts received by SRE as part of the standard incident response process.
When Approved Access is enabled and an SRE creates an access request, cluster owners receive an email notification informing them of a new access request. The email notification contains a link allowing the cluster owner to quickly approve or deny the access request. You must respond in a timely manner otherwise there is a risk to your SLA for Red Hat OpenShift Service on AWS.
- If customers require additional users that are not the cluster owner to receive the email, they can add notification cluster contacts.
- Pending access requests are available in the Hybrid Cloud Console on the clusters list or Access Requests tab on the cluster overview for the specific cluster.
Denying an access request requires you to complete the Justification field. In this case, SRE can not directly act on the resources related to the incident. Customers can still use the Customer Support to help investigate and resolve any issues.
3.1. Enabling Approved Access for ROSA clusters by submitting a support case Copiar o linkLink copiado para a área de transferência!
Red Hat OpenShift Service on AWS Approved Access is not enabled by default. To enable Approved Access for your Red Hat OpenShift Service on AWS clusters, you should create a support ticket.
Procedure
- Log in to the Customer Support page of the Red Hat Customer Portal.
- Click Get support.
On the Cases tab of the Customer support page:
- Optional: Change the pre-filled account and owner details if needed.
- Select the Configuration category and click Continue.
Enter the following information:
- In the Product field, select Red Hat OpenShift Service on AWS Hosted control planes.
- In the Problem statement field, enter Enable ROSA Access Protection.
- Click See more options.
- Select OpenShift Cluster ID from the drop-down list.
Fill the remaining mandatory fields in the form:
What are you experiencing? What are you expecting to happen?
- Fill with Approved Access.
Define the value or impact to you or the business.
- Fill with Approved Access.
- Click Continue.
- Select Severity as 4(Low) and click Continue.
- Preview the case details and click Submit.
3.2. Reviewing an access request from an email notification Copiar o linkLink copiado para a área de transferência!
Cluster owners will receive an email notification when Red Hat Site Reliability Engineering (SRE) request access to their cluster with a link to review the request in the Hybrid Cloud Console.
Procedure
- Click the link within the email to bring you to the Hybrid Cloud Console.
In the Access Request Details dialog, click Approve or Deny under Decision.
NoteDenying an access request requires you to complete the Justification field. In this case, SRE can not directly act on the resources related to the incident. Customers can still use the Customer Support to help investigate and resolve any issues.
- Click Save.
3.3. Reviewing an access request from the Hybrid Cloud Console Copiar o linkLink copiado para a área de transferência!
Review access requests for your Red Hat OpenShift Service on AWS clusters from the Hybrid Cloud Console.
Procedure
- Navigate to OpenShift Cluster Manager and select Cluster List.
- Click the cluster name to review the Access Request.
- Select the Access Requests tab to list all states.
- Select Open under Actions for the Pending state.
In the Access Request Details dialog, click Approve or Deny under Decision.
NoteDenying an access request requires you to complete the Justification field. In this case, SRE can not directly act on the resources related to the incident. Customers can still use the Customer Support to help investigate and resolve any issues.
- Click Save.
Chapter 4. Getting support Copiar o linkLink copiado para a área de transferência!
You can get support for Red Hat OpenShift Service on AWS by searching the knowledge base, submitting a support case, and using remote health monitoring tools.
4.1. Getting support Copiar o linkLink copiado para a área de transferência!
If you experience difficulty with a procedure described in this documentation, or with Red Hat OpenShift Service on AWS in general, visit the Red Hat Customer Portal.
From the Customer Portal, you can:
- Search or browse through the Red Hat Knowledgebase of articles and solutions relating to Red Hat products.
- Submit a support case to Red Hat Support.
- Access other product documentation.
To identify issues with your cluster, you can use Red Hat Lightspeed in OpenShift Cluster Manager. Red Hat Lightspeed provides details about issues and, if available, information on how to solve a problem.
If you have a suggestion for improving this documentation or have found an error, submit a Jira issue for the most relevant documentation component. Please provide specific details, such as the section name and Red Hat OpenShift Service on AWS version.
4.2. About the Red Hat Knowledgebase Copiar o linkLink copiado para a área de transferência!
The Red Hat Knowledgebase provides rich content aimed at helping you make the most of Red Hat’s products and technologies. The Red Hat Knowledgebase consists of articles, product documentation, and videos outlining best practices on installing, configuring, and using Red Hat products. In addition, you can search for solutions to known issues, each providing concise root cause descriptions and remedial steps.
4.3. Searching the Red Hat Knowledgebase Copiar o linkLink copiado para a área de transferência!
In the event of an Red Hat OpenShift Service on AWS issue, you can perform an initial search to determine if a solution already exists within the Red Hat Knowledgebase.
Prerequisites
- You have a Red Hat Customer Portal account.
Procedure
- Log in to the Red Hat Customer Portal.
- Click Search.
In the search field, input keywords and strings relating to the problem, including:
- Red Hat OpenShift Service on AWS components (such as etcd)
- Related procedure (such as installation)
- Warnings, error messages, and other outputs related to explicit failures
- Click the Enter key.
- Optional: Select the Red Hat OpenShift Service on AWS product filter.
- Optional: Select the Documentation content type filter.
4.4. Submitting a support case Copiar o linkLink copiado para a área de transferência!
Submit a support case to Red Hat Support to get help with issues you encounter with Red Hat OpenShift Service on AWS.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. -
You have installed the OpenShift CLI (
oc). - You have access to the Red Hat OpenShift Cluster Manager.
Procedure
- Log in to the Customer Support page of the Red Hat Customer Portal.
- Click Get support.
On the Cases tab of the Customer Support page:
- Optional: Change the pre-filled account and owner details if needed.
- Select the appropriate category for your issue, such as Bug or Defect, and click Continue.
Enter the following information:
- In the Summary field, enter a concise but descriptive problem summary and further details about the symptoms being experienced, as well as your expectations.
- Select Red Hat OpenShift Service on AWS from the Product drop-down menu.
- Review the list of suggested Red Hat Knowledgebase solutions for a potential match against the problem that is being reported. If the suggested articles do not address the issue, click Continue.
- Review the updated list of suggested Red Hat Knowledgebase solutions for a potential match against the problem that is being reported. The list is refined as you provide more information during the case creation process. If the suggested articles do not address the issue, click Continue.
- Ensure that the account information presented is as expected, and if not, amend accordingly.
Check that the autofilled Red Hat OpenShift Service on AWS Cluster ID is correct. If it is not, manually obtain your cluster ID.
To manually obtain your cluster ID using the Red Hat OpenShift Service on AWS web console:
- Navigate to Home → Overview.
- Find the value in the Cluster ID field of the Details section.
Alternatively, it is possible to open a new support case through the Red Hat OpenShift Service on AWS web console and have your cluster ID autofilled.
- From the toolbar, navigate to (?) Help → Open Support Case.
- The Cluster ID value is autofilled.
To obtain your cluster ID using the OpenShift CLI (
oc), run the following command:oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}'$ oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Complete the following questions where prompted and then click Continue:
- What are you experiencing? What are you expecting to happen?
- Define the value or impact to you or the business.
- Where are you experiencing this behavior? What environment?
- When does this behavior occur? Frequency? Repeatedly? At certain times?
- Upload relevant diagnostic data files and click Continue.
- Input relevant case management details and click Continue.
- Preview the case details and click Submit.
Chapter 5. Remote health monitoring with connected clusters Copiar o linkLink copiado para a área de transferência!
5.1. About remote health monitoring Copiar o linkLink copiado para a área de transferência!
Red Hat OpenShift Service on AWS collects telemetry and configuration data about your cluster and reports it to Red Hat by using the Telemeter Client and the Insights Operator. The data that is provided to Red Hat enables the benefits outlined in this document.
A cluster that reports data to Red Hat through Telemetry and the Insights Operator is considered a connected cluster.
Telemetry is the term that Red Hat uses to describe the information being sent to Red Hat by the Red Hat OpenShift Service on AWS Telemeter Client. Lightweight attributes are sent from connected clusters to Red Hat to enable subscription management automation, monitor the health of clusters, assist with support, and improve customer experience.
The Insights Operator gathers Red Hat OpenShift Service on AWS configuration data and sends it to Red Hat. The data is used to produce insights about potential issues that a cluster might be exposed to. These insights are communicated to cluster administrators on OpenShift Cluster Manager.
More information is provided in this document about these two processes.
5.1.1. Telemetry and Insights Operator benefits Copiar o linkLink copiado para a área de transferência!
Telemetry and the Insights Operator enable the following benefits for end-users:
- Enhanced identification and resolution of issues. Events that might seem normal to an end-user can be observed by Red Hat from a broader perspective across a fleet of clusters. Some issues can be more rapidly identified from this point of view and resolved without an end-user needing to open a support case or file a Jira issue.
-
Advanced release management. Red Hat OpenShift Service on AWS offers the
candidate,fast, andstablerelease channels, which enable you to choose an update strategy. The graduation of a release fromfasttostableis dependent on the success rate of updates and on the events seen during upgrades. With the information provided by connected clusters, Red Hat can improve the quality of releases tostablechannels and react more rapidly to issues found in thefastchannels. - Targeted prioritization of new features and functionality. The data collected provides insights about which areas of Red Hat OpenShift Service on AWS are used most. With this information, Red Hat can focus on developing the new features and functionality that have the greatest impact for our customers.
- A streamlined support experience. You can provide a cluster ID for a connected cluster when creating a support ticket on the Red Hat Customer Portal. This enables Red Hat to deliver a streamlined support experience that is specific to your cluster, by using the connected information. This document provides more information about that enhanced support experience.
- Predictive analytics. The insights displayed for your cluster on OpenShift Cluster Manager are enabled by the information collected from connected clusters. Red Hat is investing in applying deep learning, machine learning, and artificial intelligence automation to help identify issues that Red Hat OpenShift Service on AWS clusters are exposed to.
On Red Hat OpenShift Service on AWS, remote health reporting is always enabled. You cannot opt out of it.
5.1.2. About Telemetry Copiar o linkLink copiado para a área de transferência!
Telemetry sends a carefully chosen subset of the cluster monitoring metrics to Red Hat. The Telemeter Client fetches the metrics values every four minutes and thirty seconds and uploads the data to Red Hat. These metrics are described in this document.
This stream of data is used by Red Hat to monitor the clusters in real-time and to react as necessary to problems that impact our customers. It also allows Red Hat to roll out Red Hat OpenShift Service on AWS upgrades to customers to minimize service impact and continuously improve the upgrade experience.
This debugging information is available to Red Hat Support and Engineering teams with the same restrictions as accessing data reported through support cases. All connected cluster information is used by Red Hat to help make Red Hat OpenShift Service on AWS better and more intuitive to use.
5.1.5. About the Insights Operator Copiar o linkLink copiado para a área de transferência!
The Insights Operator periodically gathers configuration and component failure status and, by default, reports that data every two hours to Red Hat. This information enables Red Hat to assess configuration and deeper failure data than is reported through Telemetry.
Users of Red Hat OpenShift Service on AWS can display the report of each cluster in the Advisor service on Red Hat Hybrid Cloud Console. If any issues have been identified, Red Hat Lightspeed provides further details and, if available, steps on how to solve a problem.
The Insights Operator does not collect identifying information, such as user names, passwords, or certificates. See Red Hat Lightspeed Data & Application Security for information about Red Hat Lightspeed data collection and controls.
Red Hat uses all connected cluster information to:
- Identify potential cluster issues and provide a solution and preventive actions in the Advisor service on Red Hat Hybrid Cloud Console
- Improve Red Hat OpenShift Service on AWS by providing aggregated and critical information to product and support teams
- Make Red Hat OpenShift Service on AWS more intuitive
5.1.5.1. Information collected by the Insights Operator Copiar o linkLink copiado para a área de transferência!
The following information is collected by the Insights Operator:
- General information about your cluster and its components to identify issues that are specific to your Red Hat OpenShift Service on AWS version and environment.
- Configuration files, such as the image registry configuration, of your cluster to determine incorrect settings and issues that are specific to parameters you set.
- Errors that occur in the cluster components.
- Progress information of running updates, and the status of any component upgrades.
- Details of the platform that Red Hat OpenShift Service on AWS is deployed on and the region that the cluster is located in
- Cluster workload information transformed into discreet Secure Hash Algorithm (SHA) values, which allows Red Hat to assess workloads for security and version vulnerabilities without disclosing sensitive details.
- Workload information about the operating system and runtime environment, including runtime kinds, names, and version. This data gives Red Hat a better understanding of how you use Red Hat OpenShift Service on AWS containers so that we can proactively help you make investment decisions to drive optimal utilization.
-
If an Operator reports an issue, information is collected about core Red Hat OpenShift Service on AWS pods in the
openshift-*andkube-*projects. This includes state, resource, security context, volume information, and more.
5.1.7. Understanding Telemetry and Insights Operator data flow Copiar o linkLink copiado para a área de transferência!
The Telemeter Client collects selected time series data from the Prometheus API. The time series data is uploaded to api.openshift.com every four minutes and thirty seconds for processing.
The Insights Operator gathers selected data from the Kubernetes API and the Prometheus API into an archive. The archive is uploaded to OpenShift Cluster Manager every two hours for processing. The Insights Operator also downloads the latest Red Hat Lightspeed analysis from OpenShift Cluster Manager. This is used to populate the Red Hat Lightspeed status pop-up that is included in the Overview page in the Red Hat OpenShift Service on AWS web console.
All of the communication with Red Hat occurs over encrypted channels by using Transport Layer Security (TLS) and mutual certificate authentication. All of the data is encrypted in transit and at rest.
Access to the systems that handle customer data is controlled through multi-factor authentication and strict authorization controls. Access is granted on a need-to-know basis and is limited to required operations.
5.1.7.1. Telemetry and Insights Operator data flow Copiar o linkLink copiado para a área de transferência!
5.1.8. Additional details about how remote health monitoring data is used Copiar o linkLink copiado para a área de transferência!
The information collected to enable remote health monitoring is detailed in Information collected by Telemetry and Information collected by the Insights Operator.
As further described in the preceding sections of this document, Red Hat collects data about your use of the Red Hat Product(s) for purposes such as providing support and upgrades, optimizing performance or configuration, minimizing service impacts, identifying and remediating threats, troubleshooting, improving the offerings and user experience, responding to issues, and for billing purposes if applicable.
5.1.9. Collection safeguards Copiar o linkLink copiado para a área de transferência!
Red Hat employs technical and organizational measures designed to protect the telemetry and configuration data.
5.1.10. Sharing Copiar o linkLink copiado para a área de transferência!
Red Hat might share the data collected through Telemetry and the Insights Operator internally within Red Hat to improve your user experience. Red Hat might share telemetry and configuration data with its business partners in an aggregated form that does not identify customers to help the partners better understand their markets and their customers' use of Red Hat offerings or to ensure the successful integration of products jointly supported by those partners.
5.1.11. Third parties Copiar o linkLink copiado para a área de transferência!
Red Hat may engage certain third parties to assist in the collection, analysis, and storage of the Telemetry and configuration data.
5.2. Showing data collected by remote health monitoring Copiar o linkLink copiado para a área de transferência!
As an administrator, you can review the metrics collected by Telemetry and the Insights Operator.
5.2.1. Showing data collected by Telemetry Copiar o linkLink copiado para a área de transferência!
You can view the cluster and components time series data captured by Telemetry.
Prerequisites
-
You have installed the OpenShift Container Platform CLI (
oc). -
You have access to the cluster as a user with the
dedicated-adminrole.
Procedure
- Log in to a cluster.
Run the following command, which queries a cluster’s Prometheus service and returns the full set of time series data captured by Telemetry:
NoteThe following example contains some values that are specific to Red Hat OpenShift Service on AWS on AWS.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.3. Using Red Hat Lightspeed to identify issues with your cluster Copiar o linkLink copiado para a área de transferência!
Red Hat Lightspeed repeatedly analyzes the data Insights Operator sends, which includes workload recommendations from Deployment Validation Operator (DVO). Users of Red Hat OpenShift Service on AWS can display the results in the Advisor service on Red Hat Hybrid Cloud Console.
5.3.1. About Red Hat Lightspeed Advisor for Red Hat OpenShift Service on AWS Copiar o linkLink copiado para a área de transferência!
You can use the Red Hat Lightspeed advisor service to assess and monitor the health of your Red Hat OpenShift Service on AWS clusters. Whether you are concerned about individual clusters, or with your whole infrastructure, it is important to be aware of the exposure of your cluster infrastructure to issues that can affect service availability, fault tolerance, performance, or security.
If the cluster has the Deployment Validation Operator (DVO) installed the recommendations also highlight workloads whose configuration might lead to cluster health issues.
The results of the Red Hat Lightspeed analysis are available in the Red Hat Lightspeed advisor service on Red Hat Hybrid Cloud Console. In the Red Hat Hybrid Cloud Console, you can perform the following actions:
- View clusters and workloads affected by specific recommendations.
- Use robust filtering capabilities to refine your results to those recommendations.
- Learn more about individual recommendations, details about the risks they present, and get resolutions tailored to your individual clusters.
- Share results with other stakeholders.
Additional resources
5.3.2. Understanding Red Hat Lightspeed advisor service recommendations Copiar o linkLink copiado para a área de transferência!
The Red Hat Lightspeed advisor service bundles information about various cluster states and component configurations that can negatively affect the service availability, fault tolerance, performance, or security of your clusters and workloads. This information set is called a recommendation in the Red Hat Lightspeed advisor service. Recommendations for clusters includes the following information:
- Name: A concise description of the recommendation
- Added: When the recommendation was published to the Red Hat Lightspeed advisor service archive
- Category: Whether the issue has the potential to negatively affect service availability, fault tolerance, performance, or security
- Total risk: A value derived from the likelihood that the condition will negatively affect your cluster or workload, and the impact on operations if that were to happen
- Clusters: A list of clusters on which a recommendation is detected
- Description: A brief synopsis of the issue, including how it affects your clusters
5.3.3. Displaying potential issues with your cluster Copiar o linkLink copiado para a área de transferência!
This section describes how to display the Red Hat Lightspeed report in Red Hat Lightspeed Advisor on OpenShift Cluster Manager.
Note that Red Hat Lightspeed repeatedly analyzes your cluster and shows the latest results. These results can change, for example, if you fix an issue or a new issue has been detected.
Prerequisites
- Your cluster is registered on OpenShift Cluster Manager.
- Remote health reporting is enabled, which is the default.
- You are logged in to OpenShift Cluster Manager.
Procedure
Navigate to Advisor → Recommendations on OpenShift Cluster Manager.
Depending on the result, the Red Hat Lightspeed advisor service displays one of the following:
- No matching recommendations found, if Red Hat Lightspeed did not identify any issues.
- A list of issues Red Hat Lightspeed has detected, grouped by risk (low, moderate, important, and critical).
- No clusters yet, if Red Hat Lightspeed has not yet analyzed the cluster. The analysis starts shortly after the cluster has been installed, registered, and connected to the internet.
If any issues are displayed, click the > icon in front of the entry for more details.
Depending on the issue, the details can also contain a link to more information from Red Hat about the issue.
5.3.4. Displaying all Red Hat Lightspeed advisor service recommendations Copiar o linkLink copiado para a área de transferência!
The Recommendations view, by default, only displays the recommendations that are detected on your clusters. However, you can view all of the recommendations in the advisor service’s archive.
Prerequisites
- Remote health reporting is enabled, which is the default.
- Your cluster is registered on Red Hat Hybrid Cloud Console.
- You are logged in to OpenShift Cluster Manager.
Procedure
- Navigate to Advisor → Recommendations on OpenShift Cluster Manager.
Click the X icons next to the Clusters Impacted and Status filters.
You can now browse through all of the potential recommendations for your cluster.
5.3.5. Advisor recommendation filters Copiar o linkLink copiado para a área de transferência!
The Red Hat Lightspeed advisor service can return a large number of recommendations. To focus on your most critical recommendations, you can apply filters to the Advisor recommendations list to remove low-priority recommendations.
By default, filters are set to only show enabled recommendations that are impacting one or more clusters. To view all or disabled recommendations in the Red Hat Lightspeed library, you can customize the filters.
To apply a filter, select a filter type and then set its value based on the options that are available in the drop-down list. You can apply multiple filters to the list of recommendations.
You can set the following filter types:
- Name: Search for a recommendation by name.
- Total risk: Select one or more values from Critical, Important, Moderate, and Low indicating the likelihood and the severity of a negative impact on a cluster.
- Impact: Select one or more values from Critical, High, Medium, and Low indicating the potential impact to the continuity of cluster operations.
- Likelihood: Select one or more values from Critical, High, Medium, and Low indicating the potential for a negative impact to a cluster if the recommendation comes to fruition.
- Category: Select one or more categories from Service Availability, Performance, Fault Tolerance, Security, and Best Practice to focus your attention on.
- Status: Click a radio button to show enabled recommendations (default), disabled recommendations, or all recommendations.
- Clusters impacted: Set the filter to show recommendations currently impacting one or more clusters, non-impacting recommendations, or all recommendations.
- Risk of change: Select one or more values from High, Moderate, Low, and Very low indicating the risk that the implementation of the resolution could have on cluster operations.
5.3.5.1. Filtering Red Hat Lightspeed advisor service recommendations Copiar o linkLink copiado para a área de transferência!
As an Red Hat OpenShift Service on AWS cluster manager, you can filter the recommendations that are displayed on the recommendations list. By applying filters, you can reduce the number of reported recommendations and concentrate on your highest priority recommendations.
The following procedure demonstrates how to set and remove Category filters; however, the procedure is applicable to any of the filter types and respective values.
Prerequisites
You are logged in to the OpenShift Cluster Manager in the Hybrid Cloud Console.
Procedure
- Go to OpenShift > Advisor > Recommendations.
- In the main, filter-type drop-down list, select the Category filter type.
- Expand the filter-value drop-down list and select the checkbox next to each category of recommendation you want to view. Leave the checkboxes for unnecessary categories clear.
Optional: Add additional filters to further refine the list.
Only recommendations from the selected categories are shown in the list.
Verification
- After applying filters, you can view the updated recommendations list. The applied filters are added next to the default filters.
5.3.5.2. Removing filters from Red Hat Lightspeed advisor service recommendations Copiar o linkLink copiado para a área de transferência!
You can apply multiple filters to the list of recommendations. When ready, you can remove them individually or completely reset them.
Procedure
Removing filters individually
- Click the X icon next to each filter, including the default filters, to remove them individually.
Removing all non-default filters
- Click Reset filters to remove only the filters that you applied, leaving the default filters in place.
5.3.6. Disabling Red Hat Lightspeed advisor service recommendations Copiar o linkLink copiado para a área de transferência!
You can disable specific recommendations that affect your clusters, so that they no longer appear in your reports. It is possible to disable a recommendation for a single cluster or all of your clusters.
Disabling a recommendation for all of your clusters also applies to any future clusters.
Prerequisites
- Remote health reporting is enabled, which is the default.
- Your cluster is registered on OpenShift Cluster Manager.
- You are logged in to OpenShift Cluster Manager.
Procedure
- Navigate to Advisor → Recommendations on OpenShift Cluster Manager.
- Optional: Use the Clusters Impacted and Status filters as needed.
Disable an alert by using one of the following methods:
To disable an alert:
-
Click the Options menu
for that alert, and then click Disable recommendation.
- Enter a justification note and click Save.
-
Click the Options menu
To view the clusters affected by this alert before disabling the alert:
- Click the name of the recommendation to disable. You are directed to the single recommendation page.
- Review the list of clusters in the Affected clusters section.
- Click Actions → Disable recommendation to disable the alert for all of your clusters.
- Enter a justification note and click Save.
5.3.7. Enabling a previously disabled Red Hat Lightspeed advisor service recommendation Copiar o linkLink copiado para a área de transferência!
When a recommendation is disabled for all clusters, you no longer see the recommendation in the Red Hat Lightspeed advisor service. You can change this behavior.
Prerequisites
- Remote health reporting is enabled, which is the default.
- Your cluster is registered on OpenShift Cluster Manager.
- You are logged in to OpenShift Cluster Manager.
Procedure
- Navigate to Advisor → Recommendations on OpenShift Cluster Manager.
Filter the recommendations to display on the disabled recommendations:
- From the Status drop-down menu, select Status.
- From the Filter by status drop-down menu, select Disabled.
- Optional: Clear the Clusters impacted filter.
- Locate the recommendation to enable.
-
Click the Options menu
, and then click Enable recommendation.
5.3.8. About Red Hat Lightspeed advisor service recommendations for workloads Copiar o linkLink copiado para a área de transferência!
You can use the Red Hat Lightspeed advisor service to view and manage information about recommendations that affect not only your clusters, but also your workloads. The advisor service takes advantage of deployment validation and helps OpenShift cluster administrators to see all runtime violations of deployment policies. You can see recommendations for workloads at OpenShift > Advisor > Workloads on the Red Hat Hybrid Cloud Console. For more information, see these additional resources:
- Information about Kubernetes workloads
- Boost your cluster operations with Deployment Validation and Red Hat Lightspeed Advisor for Workloads
- Identifying workload recommendations for namespaces in your clusters
- Viewing workload recommendations for namespaces in your cluster
- Excluding objects from workload recommendations in your clusters
5.3.9. Displaying the Red Hat Lightspeed status in the web console Copiar o linkLink copiado para a área de transferência!
Red Hat Lightspeed repeatedly analyzes your cluster and you can display the status of identified potential issues of your cluster in the Red Hat OpenShift Service on AWS web console. This status shows the number of issues in the different categories and, for further details, links to the reports in OpenShift Cluster Manager.
Prerequisites
- Your cluster is registered in OpenShift Cluster Manager.
- Remote health reporting is enabled, which is the default.
- You are logged in to the Red Hat OpenShift Service on AWS web console.
Procedure
- Navigate to Home → Overview in the Red Hat OpenShift Service on AWS web console.
Click Red Hat Lightspeed on the Status card.
The pop-up window lists potential issues grouped by risk. Click the individual categories or View all recommendations in Red Hat Lightspeed Advisor to display more details.
5.4. Using the Insights Operator Copiar o linkLink copiado para a área de transferência!
The Insights Operator periodically gathers configuration and component failure status and, by default, reports that data every two hours to Red Hat. This information enables Red Hat to assess configuration and deeper failure data than is reported through Telemetry. Users of Red Hat OpenShift Service on AWS can display the report in the Advisor service on Red Hat Hybrid Cloud Console.
5.4.2. Understanding Insights Operator alerts Copiar o linkLink copiado para a área de transferência!
The Insights Operator declares alerts through the Prometheus monitoring system to the Alertmanager. You can view these alerts in the Alerting UI in the Red Hat OpenShift Service on AWS web console by using one of the following methods:
- In the Administrator perspective, click Observe → Alerting.
- In the Developer perspective, click Observe → <project_name> → Alerts tab.
Currently, Insights Operator sends the following alerts when the conditions are met:
| Alert | Description |
|---|---|
|
| Insights Operator is disabled. |
|
| Simple content access is not enabled in Red Hat Subscription Management. |
|
| Red Hat Lightspeed has an active recommendation for the cluster. |
5.4.3. Obfuscating Deployment Validation Operator data Copiar o linkLink copiado para a área de transferência!
By default, when you install the Deployment Validation Operator (DVO), the name and unique identifier (UID) of a resource are included in the data that is captured and processed by the Insights Operator for Red Hat OpenShift Service on AWS. If you are a cluster administrator, you can configure the Insights Operator to obfuscate data from the Deployment Validation Operator (DVO). For example, you can obfuscate workload names in the archive file that is then sent to Red Hat.
To obfuscate the name of resources, you must manually set the obfuscation attribute in the insights-config ConfigMap object to include the workload_names value, as outlined in the following procedure.
Prerequisites
- Remote health reporting is enabled, which is the default.
- You are logged in to the Red Hat OpenShift Service on AWS web console with the "cluster-admin" role.
-
The insights-config
ConfigMapobject exists in theopenshift-insightsnamespace. - The cluster is self managed and the Deployment Validation Operator is installed.
Procedure
- Go to Workloads → ConfigMaps and select Project: openshift-insights.
-
Click the
insights-configConfigMapobject to open it. - Click Actions and select Edit ConfigMap.
- Click the YAML view radio button.
In the file, set the
obfuscationattribute with theworkload_namesvalue.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Click Save. The insights-config config-map details page opens.
-
Verify that the value of the
config.yamlobfuscationattribute is set to- workload_names.
Chapter 6. Gathering data about your cluster Copiar o linkLink copiado para a área de transferência!
When opening a support case, it is helpful to provide debugging information about your cluster to Red Hat Support. You can use tools such as must-gather, sosreport, and cluster node journal logs to collect diagnostic data.
When opening a support case, it is helpful to provide debugging information about your cluster to Red Hat Support.
It is recommended to provide:
6.1. About the must-gather tool Copiar o linkLink copiado para a área de transferência!
The oc adm must-gather CLI command collects the information from your cluster that is most likely needed for debugging issues, including:
- Resource definitions
- Service logs
By default, the oc adm must-gather command uses the default plugin image and writes into ./must-gather.local.
Alternatively, you can collect specific information by running the command with the appropriate arguments as described in the following sections:
To collect data related to one or more specific features, use the
--imageargument with an image, as listed in a following section.For example:
oc adm must-gather \ --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.21.0
$ oc adm must-gather \ --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.21.0Copy to Clipboard Copied! Toggle word wrap Toggle overflow To collect the audit logs, use the
-- /usr/bin/gather_audit_logsargument, as described in a following section.For example:
oc adm must-gather -- /usr/bin/gather_audit_logs
$ oc adm must-gather -- /usr/bin/gather_audit_logsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Note- Audit logs are not collected as part of the default set of information to reduce the size of the files.
-
On a Windows operating system, install the
cwRsyncclient and add to thePATHvariable for use with theoc rsynccommand.
When you run oc adm must-gather, a new pod with a random name is created in a new project on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local in the current working directory.
For example:
NAMESPACE NAME READY STATUS RESTARTS AGE ... openshift-must-gather-5drcj must-gather-bklx4 2/2 Running 0 72s openshift-must-gather-5drcj must-gather-s8sdh 2/2 Running 0 72s ...
NAMESPACE NAME READY STATUS RESTARTS AGE
...
openshift-must-gather-5drcj must-gather-bklx4 2/2 Running 0 72s
openshift-must-gather-5drcj must-gather-s8sdh 2/2 Running 0 72s
...
Optionally, you can run the oc adm must-gather command in a specific namespace by using the --run-namespace option.
For example:
oc adm must-gather --run-namespace <namespace> \ --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.21.0
$ oc adm must-gather --run-namespace <namespace> \
--image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.21.0
6.1.1. Gathering data about your cluster for Red Hat Support Copiar o linkLink copiado para a área de transferência!
You can gather debugging information about your cluster by using the oc adm must-gather CLI command.
Prerequisites
-
You have access to the cluster as a user with the
cluster-adminrole. -
The Red Hat OpenShift Service on AWS CLI (
oc) is installed.
Procedure
Navigate to the directory where you want to store the
must-gatherdata.NoteIf your cluster is in a disconnected environment, you must take additional steps. If your mirror registry has a trusted CA, you must first add the trusted CA to the cluster. For all clusters in disconnected environments, you must import the default
must-gatherimage as an image stream.oc import-image is/must-gather -n openshift
$ oc import-image is/must-gather -n openshiftCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run the
oc adm must-gathercommand:oc adm must-gather
$ oc adm must-gatherCopy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantIf you are in a disconnected environment, use the
--imageflag as part of must-gather and point to the payload image.NoteBecause this command picks a random control plane node by default, the pod might be scheduled to a control plane node that is in the
NotReadyandSchedulingDisabledstate.If this command fails, for example, if you cannot schedule a pod on your cluster, then use the
oc adm inspectcommand to gather information for particular resources.NoteContact Red Hat Support for the recommended resources to gather.
Create a compressed file from the
must-gatherdirectory that was just created in your working directory. Make sure you provide the date and cluster ID for the unique must-gather data. For more information about how to find the cluster ID, see How to find the cluster-id or name on OpenShift cluster. For example, on a computer that uses a Linux operating system, run the following command:tar cvaf must-gather-`date +"%m-%d-%Y-%H-%M-%S"`-<cluster_id>.tar.gz <must_gather_local_dir>
$ tar cvaf must-gather-`date +"%m-%d-%Y-%H-%M-%S"`-<cluster_id>.tar.gz <must_gather_local_dir>Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
<must_gather_local_dir>- Replace with the actual directory name.
- Attach the compressed file to your support case on the the Customer Support page of the Red Hat Customer Portal.
6.2. Reducing the size of must-gather output Copiar o linkLink copiado para a área de transferência!
The oc adm must-gather command collects comprehensive cluster information. However, a full data collection can result in a large file that is difficult to upload and analyze and could result in timeouts.
To manage the output size and target your data collection for more effective troubleshooting, you can pass specific flags to the underlying gather script or scope the collection to particular resources.
6.2.1. Gathering data for specific resources Copiar o linkLink copiado para a área de transferência!
Instead of collecting data for the entire cluster, you can direct the must-gather tool to inspect a specific resource. This method is highly effective for isolating issues within a single project, Operator, or application.
The must-gather tool uses oc adm inspect internally. You can specify what to inspect by passing the inspect command and its arguments after the -- separator.
Procedure
To gather data for a specific namespace, such as
my-project, run the following command:oc adm must-gather --dest-dir=my-project-must-gather -- oc adm inspect ns/my-project
$ oc adm must-gather --dest-dir=my-project-must-gather -- oc adm inspect ns/my-projectCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
This command collects all standard resources within the
my-projectnamespace, including logs from pods in that namespace, but excludes cluster-scoped resources. To gather data related to a specific Cluster Operator, such as
openshift-apiserver, run the following command:oc adm must-gather --dest-dir=apiserver-must-gather -- oc adm inspect clusteroperator/openshift-apiserver
$ oc adm must-gather --dest-dir=apiserver-must-gather -- oc adm inspect clusteroperator/openshift-apiserverCopy to Clipboard Copied! Toggle word wrap Toggle overflow To exclude logs entirely and significantly reduce the size of the
must-gatherarchive, add a double dash (--) afteroc adm must-gathercommand and add the--no-logsargument:oc adm must-gather -- /usr/bin/gather --no-logs
$ oc adm must-gather -- /usr/bin/gather --no-logsCopy to Clipboard Copied! Toggle word wrap Toggle overflow
6.2.2. Must-gather flags Copiar o linkLink copiado para a área de transferência!
The flags listed in the following table are available to use with the oc adm must-gather command.
| Flag | Example command | Description |
|---|---|---|
|
|
|
Collect |
|
|
| Set a specific directory on the local machine where the gathered data is written. |
|
|
|
Run |
|
|
|
Specify a |
|
|
|
Specify an`<image_stream>` using a namespace or name:tag value containing a |
|
|
| Set a specific node to use. If not specified, by default a random master is used. |
|
|
| Set a specific node selector to use. Only relevant when specifying a command and image which needs to capture data on a set of cluster nodes simultaneously. |
|
|
|
An existing privileged namespace where |
|
|
|
Only return logs newer than the specified duration. Defaults to all logs. Plugins are encouraged but not required to support this. Only one |
|
|
|
Only return logs after a specific date and time, expressed in (RFC3339) format. Defaults to all logs. Plugins are encouraged but not required to support this. Only one |
|
|
| Set the specific directory on the pod where you copy the gathered data from. |
|
|
| The length of time to gather data before timing out, expressed as seconds, minutes, or hours, for example, 3s, 5m, or 2h. Time specified must be higher than zero. Defaults to 10 minutes if not specified. |
|
|
|
Specify maximum percentage of pod’s allocated volume that can be used for |
6.2.3. Gathering data about specific features Copiar o linkLink copiado para a área de transferência!
You can gather debugging information about specific features by using the oc adm must-gather CLI command with the --image or --image-stream argument. The must-gather tool supports multiple images, so you can gather data about more than one feature by running a single command.
| Image | Purpose |
|---|---|
|
| Data collection for OpenShift Virtualization. |
|
| Data collection for OpenShift Serverless. |
|
| Data collection for Red Hat OpenShift Service Mesh. |
|
| Data collection for hosted control planes. |
|
| Data collection for the Migration Toolkit for Containers. |
|
| Data collection for Red Hat OpenShift Data Foundation. |
|
| Data collection for logging. |
|
| Data collection for the Network Observability Operator. |
|
| Data collection for Local Storage Operator. |
|
| Data collection for OpenShift sandboxed containers. |
|
| Data collection for the Red Hat Workload Availability Operators, including the Self Node Remediation (SNR) Operator, the Fence Agents Remediation (FAR) Operator, the Machine Deletion Remediation (MDR) Operator, the Node Health Check (NHC) Operator, and the Node Maintenance Operator (NMO). Use this image if your NHC Operator version is earlier than 0.9.0. For more information, see the "Gathering data" section for the specific Operator in Remediation, fencing, and maintenance (Workload Availability for Red Hat OpenShift documentation). |
|
| Data collection for the Red Hat Workload Availability Operators, including the Self Node Remediation (SNR) Operator, the Fence Agents Remediation (FAR) Operator, the Machine Deletion Remediation (MDR) Operator, the Node Health Check (NHC) Operator, and the Node Maintenance Operator (NMO). Use this image if your NHC Operator version is 0.9.0. or later. For more information, see the "Gathering data" section for the specific Operator in Remediation, fencing, and maintenance (Workload Availability for Red Hat OpenShift documentation). |
|
| Data collection for the NUMA Resources Operator (NRO). |
|
| Data collection for the PTP Operator. |
|
| Data collection for Red Hat OpenShift GitOps. |
|
| Data collection for the Secrets Store CSI Driver Operator. |
|
| Data collection for the LVM Operator. |
|
| Data collection for the Compliance Operator. |
To determine the latest version for an Red Hat OpenShift Service on AWS component’s image, see the OpenShift Operator Life Cycles web page on the Red Hat Customer Portal.
Prerequisites
-
You have access to the cluster as a user with the
cluster-adminrole. -
The Red Hat OpenShift Service on AWS CLI (
oc) is installed.
Procedure
-
Navigate to the directory where you want to store the
must-gatherdata. Run the
oc adm must-gathercommand with one or more--imageor--image-streamarguments.Note-
To collect the default
must-gatherdata in addition to specific feature data, add the--image-stream=openshift/must-gatherargument. - For information on gathering data about the Custom Metrics Autoscaler, see the Additional resources section that follows.
For example, the following command gathers both the default cluster data and information specific to OpenShift Virtualization:
oc adm must-gather \ --image-stream=openshift/must-gather \ --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.21.0
$ oc adm must-gather \ --image-stream=openshift/must-gather \ --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.21.0Copy to Clipboard Copied! Toggle word wrap Toggle overflow You can use the
must-gathertool with additional arguments to gather data that is specifically related to OpenShift Logging and the Red Hat OpenShift Logging Operator in your cluster. For OpenShift Logging, run the following command:oc adm must-gather --image=$(oc -n openshift-logging get deployment.apps/cluster-logging-operator \ -o jsonpath='{.spec.template.spec.containers[?(@.name == "cluster-logging-operator")].image}')$ oc adm must-gather --image=$(oc -n openshift-logging get deployment.apps/cluster-logging-operator \ -o jsonpath='{.spec.template.spec.containers[?(@.name == "cluster-logging-operator")].image}')Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
must-gatheroutput for OpenShift LoggingCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
To collect the default
Run the
oc adm must-gathercommand with one or more--imageor--image-streamarguments. For example, the following command gathers both the default cluster data and information specific to KubeVirt:oc adm must-gather \ --image-stream=openshift/must-gather \ --image=quay.io/kubevirt/must-gather
$ oc adm must-gather \ --image-stream=openshift/must-gather \ --image=quay.io/kubevirt/must-gatherCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a compressed file from the
must-gatherdirectory that was just created in your working directory. Make sure you provide the date and cluster ID for the unique must-gather data. For more information about how to find the cluster ID, see How to find the cluster-id or name on OpenShift cluster. For example, on a computer that uses a Linux operating system, run the following command:tar cvaf must-gather-`date +"%m-%d-%Y-%H-%M-%S"`-<cluster_id>.tar.gz <must_gather_local_dir>
$ tar cvaf must-gather-`date +"%m-%d-%Y-%H-%M-%S"`-<cluster_id>.tar.gz <must_gather_local_dir>Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
<must_gather_local_dir>- Replace with the actual directory name.
- Attach the compressed file to your support case on the the Customer Support page of the Red Hat Customer Portal.
6.4. About Support Log Gather Copiar o linkLink copiado para a área de transferência!
Support Log Gather Operator builds on the functionality of the traditional must-gather tool to automate the collection of debugging data. It streamlines troubleshooting by packaging the collected information into a single .tar file and automatically uploading it to the specified Red Hat Support case.
Support Log Gather is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The key features of Support Log Gather include the following:
- No administrator privileges required: Enables you to collect and upload logs without needing elevated permissions, making it easier for non-administrators to gather data securely.
- Simplified log collection: Collects debugging data from the cluster, such as resource definitions and service logs.
-
Configurable data upload: Provides configuration options to either automatically upload the
.tarfile to a support case, or store it locally for manual upload.
6.4.1. Installing Support Log Gather by using the web console Copiar o linkLink copiado para a área de transferência!
You can use the web console to install the Support Log Gather.
Support Log Gather is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Prerequisites
-
You have access to the cluster with
cluster-adminprivileges. - You have access to the Red Hat OpenShift Service on AWS web console.
Procedure
- Log in to the Red Hat OpenShift Service on AWS web console.
- Navigate to Ecosystem → Software Catalog.
- In the filter box, enter Support Log Gather.
- Select Support Log Gather.
- From Version list, select the Support Log Gather version, and click Install.
On the Install Operator page, configure the installation settings:
Choose the Installed Namespace for the Operator.
The default Operator namespace is
must-gather-operator. Themust-gather-operatornamespace is created automatically if it does not exist.Select an Update approval strategy:
- Select Automatic to have the Operator Lifecycle Manager (OLM) update the Operator automatically when a newer version is available.
- Select Manual if Operator updates must be approved by a user with appropriate credentials.
- Click Install.
Verification
Verify that the Operator is installed successfully:
- Navigate to Ecosystem → Software Catalog.
-
Verify that Support Log Gather is listed with a Status of Succeeded in the
must-gather-operatornamespace.
Verify that Support Log Gather pods are running:
- Navigate to Workloads → Pods
Verify that the status of the Support Log Gather pods is Running.
You can use the Support Log Gather only after the pods are up and running.
6.4.2. Installing Support Log Gather by using the CLI Copiar o linkLink copiado para a área de transferência!
To enable automated log collection for support cases, you can install Support Log Gather from the command-line interface (CLI).
Support Log Gather is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Prerequisites
-
You have access to the cluster with
cluster-adminprivileges.
Procedure
Create a new project named
must-gather-operatorby running the following command:oc new-project must-gather-operator
$ oc new-project must-gather-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create an
OperatorGroupobject:Create a YAML file, for example,
operatorGroup.yaml, that defines theOperatorGroupobject:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
OperatorGroupobject by running the following command:oc create -f operatorGroup.yaml
$ oc create -f operatorGroup.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Create a
Subscriptionobject:Create a YAML file, for example,
subscription.yaml, that defines theSubscriptionobject:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
Subscriptionobject by running the following command:oc create -f subscription.yaml
$ oc create -f subscription.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify the status of the pods in the Operator namespace by running the following command.
oc get pods
$ oc get podsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE must-gather-operator-657fc74d64-2gg2w 1/1 Running 0 13m
NAME READY STATUS RESTARTS AGE must-gather-operator-657fc74d64-2gg2w 1/1 Running 0 13mCopy to Clipboard Copied! Toggle word wrap Toggle overflow The status of all the pods must be
Running.Verify that the subscription is created by running the following command:
oc get subscription -n must-gather-operator
$ oc get subscription -n must-gather-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME PACKAGE SOURCE CHANNEL support-log-gather-operator support-log-gather-operator redhat-operators tech-preview
NAME PACKAGE SOURCE CHANNEL support-log-gather-operator support-log-gather-operator redhat-operators tech-previewCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the Operator is installed by running the following command:
oc get csv -n must-gather-operator
$ oc get csv -n must-gather-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DISPLAY VERSION REPLACES PHASE support-log-gather-operator.v4.21.0 support log gather 4.21.0 Succeeded
NAME DISPLAY VERSION REPLACES PHASE support-log-gather-operator.v4.21.0 support log gather 4.21.0 SucceededCopy to Clipboard Copied! Toggle word wrap Toggle overflow
6.4.3. Configuring a Support Log Gather instance Copiar o linkLink copiado para a área de transferência!
You must create a MustGather custom resource (CR) from the command-line interface (CLI) to automate the collection of diagnostic data from your cluster. This process also automatically uploads the data to a Red Hat Support case.
Support Log Gather is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Prerequisites
-
You have installed the OpenShift CLI (
oc) tool. - You have installed Support Log Gather in your cluster.
- You have a Red Hat Support case ID.
- You have created a Kubernetes secret containing your Red Hat Customer Portal credentials. The secret must contain a username field and a password field.
- You have created a service account.
Procedure
Create a YAML file for the
MustGatherCR, such assupport-log-gather.yaml, that contains the following basic configuration::Example
support-log-gather.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow For more information on the configuration parameters, see "Configuration parameters for MustGather custom resource".
Create the
MustGatherobject by running the following command:oc create -f support-log-gather.yaml
$ oc create -f support-log-gather.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the
MustGatherCR was created by running the following command:oc get mustgather
$ oc get mustgatherCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE example-mg 7s
NAME AGE example-mg 7sCopy to Clipboard Copied! Toggle word wrap Toggle overflow Verify the status of the pods in the Operator namespace by running the following command.
oc get pods
$ oc get podsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE must-gather-operator-657fc74d64-2gg2w 1/1 Running 0 13m example-mg-gk8m8 2/2 Running 0 13s
NAME READY STATUS RESTARTS AGE must-gather-operator-657fc74d64-2gg2w 1/1 Running 0 13m example-mg-gk8m8 2/2 Running 0 13sCopy to Clipboard Copied! Toggle word wrap Toggle overflow A new pod with a name based on the
MustGatherCR must be created. The status of all the pods must beRunning.To monitor the progress of the file upload, view the logs of the upload container in the job pod by running the following command:
oc logs -f pod/example-mg-gk8m8 -c upload
oc logs -f pod/example-mg-gk8m8 -c uploadCopy to Clipboard Copied! Toggle word wrap Toggle overflow When successful, the process must create an archive and upload it to the Red Hat Secure File Transfer Protocol (SFTP) server for the specified case.
6.6. Obtaining your cluster ID Copiar o linkLink copiado para a área de transferência!
When providing information to Red Hat Support, it is helpful to provide the unique identifier for your cluster. You can have your cluster ID autofilled by using the Red Hat OpenShift Service on AWS web console. You can also manually obtain your cluster ID by using the web console or the OpenShift CLI (oc).
Prerequisites
-
You have access to the cluster as a user with the
cluster-adminrole. -
You have access to the web console or the OpenShift CLI (
oc) installed.
Procedure
To manually obtain your cluster ID using the web console:
- Navigate to Home → Overview.
- The value is available in the Cluster ID field of the Details section.
To obtain your cluster ID using the OpenShift CLI (
oc), run the following command:oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}'$ oc get clusterversion -o jsonpath='{.items[].spec.clusterID}{"\n"}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.7. Querying cluster node journal logs Copiar o linkLink copiado para a área de transferência!
You can gather journald unit logs and other logs within /var/log on individual cluster nodes.
Prerequisites
-
You have access to the cluster as a user with the
cluster-adminrole. -
You have installed the OpenShift CLI (
oc).
Procedure
Query
kubeletjournaldunit logs from Red Hat OpenShift Service on AWS cluster nodes. The following example queries worker nodes only:oc adm node-logs --role=worker -u kubelet
$ oc adm node-logs --role=worker -u kubeletCopy to Clipboard Copied! Toggle word wrap Toggle overflow kubelet- Replace as appropriate to query other unit logs.
6.8. Network trace methods Copiar o linkLink copiado para a área de transferência!
Collecting network traces, in the form of packet capture records, can assist Red Hat Support with troubleshooting network issues.
Red Hat OpenShift Service on AWS supports two ways of performing a network trace. Review the following table and choose the method that meets your needs.
| Method | Benefits and capabilities |
|---|---|
| Collecting a host network trace | You perform a packet capture for a duration that you specify on one or more nodes at the same time. The packet capture files are transferred from nodes to the client machine when the specified duration is met. You can troubleshoot why a specific action triggers network communication issues. Run the packet capture, perform the action that triggers the issue, and use the logs to diagnose the issue. |
| Collecting a network trace from an Red Hat OpenShift Service on AWS node or container |
You perform a packet capture on one node or one container. You run the You can start the packet capture manually, trigger the network communication issue, and then stop the packet capture manually.
This method uses the |
6.9. Collecting a host network trace Copiar o linkLink copiado para a área de transferência!
Sometimes, troubleshooting a network-related issue is simplified by tracing network communication and capturing packets on multiple nodes at the same time.
You can use a combination of the oc adm must-gather command and the registry.redhat.io/openshift4/network-tools-rhel8 container image to gather packet captures from nodes. Analyzing packet captures can help you troubleshoot network communication issues.
The oc adm must-gather command is used to run the tcpdump command in pods on specific nodes. The tcpdump command records the packet captures in the pods. When the tcpdump command exits, the oc adm must-gather command transfers the files with the packet captures from the pods to your client machine.
The sample command in the following procedure demonstrates performing a packet capture with the tcpdump command. However, you can run any command in the container image that is specified in the --image argument to gather troubleshooting information from multiple nodes at the same time.
Prerequisites
-
You are logged in to Red Hat OpenShift Service on AWS as a user with the
cluster-adminrole. -
You have installed the OpenShift CLI (
oc).
Procedure
Run a packet capture from the host network on some nodes by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
--dest-dir /tmp/captures-
The
--dest-dirargument specifies thatoc adm must-gatherstores the packet captures in directories that are relative to/tmp/captureson the client machine. You can specify any writable directory. --source-dir '/tmp/tcpdump/'-
When
tcpdumpis run in the debug pod thatoc adm must-gatherstarts, the--source-dirargument specifies that the packet captures are temporarily stored in the/tmp/tcpdumpdirectory on the pod. --image registry.redhat.io/openshift4/network-tools-rhel8:latest-
The
--imageargument specifies a container image that includes thetcpdumpcommand. --node-selector 'node-role.kubernetes.io/worker'-
The
--node-selectorargument and example value specifies to perform the packet captures on the worker nodes. As an alternative, you can specify the--node-nameargument instead to run the packet capture on a single node. If you omit both the--node-selectorand the--node-nameargument, the packet captures are performed on all nodes. --host-network=true-
The
--host-network=trueargument is required so that the packet captures are performed on the network interfaces of the node. --timeout 30s-
The
--timeoutargument and value specify to run the debug pod for 30 seconds. If you do not specify the--timeoutargument and a duration, the debug pod runs for 10 minutes. -i any-
The
-i anyargument for thetcpdumpcommand specifies to capture packets on all network interfaces. As an alternative, you can specify a network interface name.
- Perform the action, such as accessing a web application, that triggers the network communication issue while the network trace captures packets.
Review the packet capture files that
oc adm must-gathertransferred from the pods to your client machine:Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
ip-10-0-192-217-ec2-internal,ip-10-0-201-178-ec2-internal-
The packet captures are stored in directories that identify the hostname, container, and file name. If you did not specify the
--node-selectorargument, then the directory level for the hostname is not present.
6.10. Collecting a network trace from an Red Hat OpenShift Service on AWS node or container Copiar o linkLink copiado para a área de transferência!
When investigating potential network-related Red Hat OpenShift Service on AWS issues, Red Hat Support might request a network packet trace from a specific Red Hat OpenShift Service on AWS cluster node or from a specific container. The recommended method to capture a network trace in Red Hat OpenShift Service on AWS is through a debug pod.
Prerequisites
-
You have access to the cluster as a user with the
cluster-adminrole. -
You have installed the OpenShift CLI (
oc). - You have an existing Red Hat Support case ID.
Procedure
Obtain a list of cluster nodes:
oc get nodes
$ oc get nodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Enter into a debug session on the target node. This step instantiates a debug pod called
<node_name>-debug:oc debug node/my-cluster-node
$ oc debug node/my-cluster-nodeCopy to Clipboard Copied! Toggle word wrap Toggle overflow Set
/hostas the root directory within the debug shell. The debug pod mounts the host’s root file system in/hostwithin the pod. By changing the root directory to/host, you can run binaries contained in the host’s executable paths:chroot /host
# chroot /hostCopy to Clipboard Copied! Toggle word wrap Toggle overflow From within the
chrootenvironment console, obtain the node’s interface names:ip ad
# ip adCopy to Clipboard Copied! Toggle word wrap Toggle overflow Start a
toolboxcontainer, which includes the required binaries and plugins to runsosreport:toolbox
# toolboxCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf an existing
toolboxpod is already running, thetoolboxcommand outputs'toolbox-' already exists. Trying to start…. To avoidtcpdumpissues, remove the running toolbox container withpodman rm toolbox-and spawn a new toolbox container.Initiate a
tcpdumpsession on the cluster node and redirect output to a capture file. This example usesens5as the interface name:tcpdump -nn -s 0 -i ens5 -w /host/var/tmp/my-cluster-node_$(date +%d_%m_%Y-%H_%M_%S-%Z).pcap
$ tcpdump -nn -s 0 -i ens5 -w /host/var/tmp/my-cluster-node_$(date +%d_%m_%Y-%H_%M_%S-%Z).pcapCopy to Clipboard Copied! Toggle word wrap Toggle overflow where:
/host/var/tmp/my-cluster-node_$(date +%d_%m_%Y-%H_%M_%S-%Z).pcap-
The
tcpdumpcapture file’s path is outside of thechrootenvironment because the toolbox container mounts the host’s root directory at/host.
If a
tcpdumpcapture is required for a specific container on the node, follow these steps.Determine the target container ID. The
chroot hostcommand precedes thecrictlcommand in this step because the toolbox container mounts the host’s root directory at/host:chroot /host crictl ps
# chroot /host crictl psCopy to Clipboard Copied! Toggle word wrap Toggle overflow Determine the container’s process ID. In this example, the container ID is
a7fe32346b120:chroot /host crictl inspect --output yaml a7fe32346b120 | grep 'pid' | awk '{print $2}'# chroot /host crictl inspect --output yaml a7fe32346b120 | grep 'pid' | awk '{print $2}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Initiate a
tcpdumpsession on the container and redirect output to a capture file. This example uses49628as the container’s process ID andens5as the interface name. Thensentercommand enters the namespace of a target process and runs a command in its namespace. because the target process in this example is a container’s process ID, thetcpdumpcommand is run in the container’s namespace from the host:nsenter -n -t 49628 -- tcpdump -nn -i ens5 -w /host/var/tmp/my-cluster-node-my-container_$(date +%d_%m_%Y-%H_%M_%S-%Z).pcap
# nsenter -n -t 49628 -- tcpdump -nn -i ens5 -w /host/var/tmp/my-cluster-node-my-container_$(date +%d_%m_%Y-%H_%M_%S-%Z).pcapCopy to Clipboard Copied! Toggle word wrap Toggle overflow where:
/host/var/tmp/my-cluster-node-my-container_$(date +%d_%m_%Y-%H_%M_%S-%Z).pcap-
The
tcpdumpcapture file’s path is outside of thechrootenvironment because the toolbox container mounts the host’s root directory at/host.
Provide the
tcpdumpcapture file to Red Hat Support for analysis, using one of the following methods.Upload the file to an existing Red Hat support case.
Concatenate the
sosreportarchive by running theoc debug node/<node_name>command and redirect the output to a file. This command assumes you have exited the previousoc debugsession:oc debug node/my-cluster-node -- bash -c 'cat /host/var/tmp/my-tcpdump-capture-file.pcap' > /tmp/my-tcpdump-capture-file.pcap
$ oc debug node/my-cluster-node -- bash -c 'cat /host/var/tmp/my-tcpdump-capture-file.pcap' > /tmp/my-tcpdump-capture-file.pcapCopy to Clipboard Copied! Toggle word wrap Toggle overflow where:
/host/var/tmp/my-tcpdump-capture-file.pcap-
The debug container mounts the host’s root directory at
/host. Reference the absolute path from the debug container’s root directory, including/host, when specifying target files for concatenation.
- Navigate to an existing support case within the Customer Support page of the Red Hat Customer Portal.
- Select Attach files and follow the prompts to upload the file.
6.11. Providing diagnostic data to Red Hat Support Copiar o linkLink copiado para a área de transferência!
When investigating Red Hat OpenShift Service on AWS issues, Red Hat Support might ask you to upload diagnostic data to a support case. Files can be uploaded to a support case through the Red Hat Customer Portal.
Prerequisites
-
You have access to the cluster as a user with the
cluster-adminrole. -
You have installed the OpenShift CLI (
oc). - You have an existing Red Hat Support case ID.
Procedure
Upload diagnostic data to an existing Red Hat support case through the Red Hat Customer Portal.
Concatenate a diagnostic file contained on an Red Hat OpenShift Service on AWS node by using the
oc debug node/<node_name>command and redirect the output to a file. The following example copies/host/var/tmp/my-diagnostic-data.tar.gzfrom a debug container to/var/tmp/my-diagnostic-data.tar.gz:oc debug node/my-cluster-node -- bash -c 'cat /host/var/tmp/my-diagnostic-data.tar.gz' > /var/tmp/my-diagnostic-data.tar.gz
$ oc debug node/my-cluster-node -- bash -c 'cat /host/var/tmp/my-diagnostic-data.tar.gz' > /var/tmp/my-diagnostic-data.tar.gzCopy to Clipboard Copied! Toggle word wrap Toggle overflow where:
/host/var/tmp/my-diagnostic-data.tar.gz-
The debug container mounts the host’s root directory at
/host. Reference the absolute path from the debug container’s root directory, including/host, when specifying target files for concatenation.
- Navigate to an existing support case within the Customer Support page of the Red Hat Customer Portal.
- Select Attach files and follow the prompts to upload the file.
6.12. About toolbox Copiar o linkLink copiado para a área de transferência!
toolbox is a tool that starts a container on a Red Hat Enterprise Linux CoreOS (RHCOS) system. The tool is primarily used to start a container that includes the required binaries and plugins that are needed to run commands such as sosreport.
The primary purpose for a toolbox container is to gather diagnostic information and to provide it to Red Hat Support. However, if additional diagnostic tools are required, you can add RPM packages or run an image that is an alternative to the standard support tools image.
6.12.1. Installing packages to a toolbox container Copiar o linkLink copiado para a área de transferência!
By default, running the toolbox command starts a container with the registry.redhat.io/rhel9/support-tools:latest image. This image contains the most frequently used support tools. If you need to collect node-specific data that requires a support tool that is not part of the image, you can install additional packages.
Prerequisites
-
You have accessed a node with the
oc debug node/<node_name>command. - You can access your system as a user with root privileges.
Procedure
Set
/hostas the root directory within the debug shell. The debug pod mounts the host’s root file system in/hostwithin the pod. By changing the root directory to/host, you can run binaries contained in the host’s executable paths:chroot /host
# chroot /hostCopy to Clipboard Copied! Toggle word wrap Toggle overflow Start the toolbox container:
toolbox
# toolboxCopy to Clipboard Copied! Toggle word wrap Toggle overflow Install the additional package, such as
wget:dnf install -y <package_name>
# dnf install -y <package_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
6.12.2. Starting an alternative image with toolbox Copiar o linkLink copiado para a área de transferência!
By default, running the toolbox command starts a container with the registry.redhat.io/rhel9/support-tools:latest image.
You can start an alternative image by creating a .toolboxrc file and specifying the image to run. However, running an older version of the support-tools image, such as registry.redhat.io/rhel8/support-tools:latest, is not supported on Red Hat OpenShift Service on AWS 4.
Prerequisites
-
You have accessed a node with the
oc debug node/<node_name>command. - You can access your system as a user with root privileges.
Procedure
Set
/hostas the root directory within the debug shell. The debug pod mounts the host’s root file system in/hostwithin the pod. By changing the root directory to/host, you can run binaries contained in the host’s executable paths:chroot /host
# chroot /hostCopy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: If you need to use an alternative image instead of the default image, create a
.toolboxrcfile in the home directory for the root user ID, and specify the image metadata:REGISTRY=quay.io IMAGE=fedora/fedora:latest TOOLBOX_NAME=toolbox-fedora-latest
REGISTRY=quay.io IMAGE=fedora/fedora:latest TOOLBOX_NAME=toolbox-fedora-latestCopy to Clipboard Copied! Toggle word wrap Toggle overflow where:
REGISTRY=quay.io- Optional: Specify an alternative container registry.
IMAGE=fedora/fedora:latest- Specify an alternative image to start.
TOOLBOX_NAME=toolbox-fedora-latest- Optional: Specify an alternative name for the toolbox container.
Start a toolbox container by entering the following command:
toolbox
# toolboxCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteIf an existing
toolboxpod is already running, thetoolboxcommand outputs'toolbox-' already exists. Trying to start…. To avoid issues withsosreportplugins, remove the running toolbox container withpodman rm toolbox-and then spawn a new toolbox container.
Chapter 7. Summarizing cluster specifications Copiar o linkLink copiado para a área de transferência!
You can summarize your cluster specifications by querying the clusterversion resource to view cluster version information and component status.
7.1. Summarizing cluster specifications by using a cluster version object Copiar o linkLink copiado para a área de transferência!
You can obtain a summary of Red Hat OpenShift Service on AWS cluster specifications by querying the clusterversion resource.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. -
You have installed the OpenShift CLI (
oc).
Procedure
Query cluster version, availability, uptime, and general status:
oc get clusterversion
$ oc get clusterversionCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.8 True False 8h Cluster version is 4.13.8
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.8 True False 8h Cluster version is 4.13.8Copy to Clipboard Copied! Toggle word wrap Toggle overflow Obtain a detailed summary of cluster specifications, update availability, and update history:
oc describe clusterversion
$ oc describe clusterversionCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 8. Troubleshooting Copiar o linkLink copiado para a área de transferência!
8.1. Review your cluster notifications Copiar o linkLink copiado para a área de transferência!
When you are trying to resolve a problem with your cluster, your cluster notifications are a good source of information.
Cluster notifications are messages about the status, health, or performance of your cluster. They are also the primary way that Red Hat Site Reliability Engineering (SRE) communicates with you about cluster health and resolving problems with your cluster.
8.1.1. Viewing cluster notifications using the Red Hat Hybrid Cloud Console Copiar o linkLink copiado para a área de transferência!
Cluster notifications provide important information about the health of your cluster. You can view notifications that have been sent to your cluster in the Cluster history tab on the Red Hat Hybrid Cloud Console.
Prerequisites
- You are logged in to the Hybrid Cloud Console.
Procedure
- Navigate to the Clusters page of the Hybrid Cloud Console.
- Click the name of your cluster to go to the cluster details page.
Click the Cluster history tab.
Cluster notifications appear under the Cluster history heading.
Optional: Filter for relevant cluster notifications
Use the filter controls to hide cluster notifications that are not relevant to you, so that you can focus on your area of expertise or on resolving a critical issue. You can filter notifications based on text in the notification description, severity level, notification type, when the notification was received, and which system or person triggered the notification.
8.2. Troubleshooting Red Hat OpenShift Service on AWS cluster installations Copiar o linkLink copiado para a área de transferência!
For help with the installation of Red Hat OpenShift Service on AWS clusters, see the following sections.
8.2.1. Installation troubleshooting Copiar o linkLink copiado para a área de transferência!
This procedure describes how to troubleshoot installation issues for Red Hat OpenShift Service on AWS clusters.
Procedure
Inspect install or uninstall logs:
To display install logs, run the following command, replacing
<cluster_name>with the name of your cluster:rosa logs install --cluster=<cluster_name>
$ rosa logs install --cluster=<cluster_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow To watch the logs, include the
--watchflag:rosa logs install --cluster=<cluster_name> --watch
$ rosa logs install --cluster=<cluster_name> --watchCopy to Clipboard Copied! Toggle word wrap Toggle overflow To display uninstall logs, run the following command, replacing
<cluster_name>with the name of your cluster:rosa logs uninstall --cluster=<cluster_name>
$ rosa logs uninstall --cluster=<cluster_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow To watch the logs, include the
--watchflag:rosa logs uninstall --cluster=<cluster_name> --watch
$ rosa logs uninstall --cluster=<cluster_name> --watchCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verify your AWS account and quota:
Run the following command to verify you have the available quota on your AWS account:
rosa verify quota
$ rosa verify quotaCopy to Clipboard Copied! Toggle word wrap Toggle overflow AWS quotas change based on region. Be sure you are verifying your quota for the correct AWS region. If you need to increase your quota, navigate to your AWS console, and request a quota increase for the service that failed.
AWS notification emails:
When creating a cluster, the Red Hat OpenShift Service on AWS service creates small instances in all supported regions. This check ensures the AWS account being used can deploy to each supported region.
For AWS accounts that are not using all supported regions, AWS may send one or more emails confirming that "Your Request For Accessing AWS Resources Has Been Validated". Typically the sender of this email is aws-verification@amazon.com.
This is expected behavior as the Red Hat OpenShift Service on AWS service is validating your AWS account configuration.
8.2.2. Verifying installation of Red Hat OpenShift Service on AWS clusters Copiar o linkLink copiado para a área de transferência!
If the Red Hat OpenShift Service on AWS cluster is in the installing state for over 30 minutes and has not become ready, ensure the AWS account environment is prepared for the required cluster configurations. If the AWS account environment is prepared for the required cluster configurations correctly, try to delete and recreate the cluster. If the problem persists, contact support.
Procedure
- Verify the AWS account environment is prepared for the required cluster configurations.
- If the AWS account environment is prepared correctly, try to delete and recreate the cluster.
- If the problem persists, contact support.
8.2.4. Troubleshooting Red Hat OpenShift Service on AWS installation error codes Copiar o linkLink copiado para a área de transferência!
The following table lists Red Hat OpenShift Service on AWS installation error codes and what you can do to troubleshoot these errors.
| Error code | Description | Resolution |
|---|---|---|
| OCM3999 | Unknown error. | Check the cluster installation logs for more details, or delete this cluster and retry cluster installation. If this issue persists, contact support by logging in to the Customer Support page. |
| OCM5001 | Red Hat OpenShift Service on AWS cluster provision has failed. | Check the cluster installation logs for more details, or delete this cluster and retry cluster installation. If this issue persists, contact support by logging in to the Customer Support page. |
| OCM5002 | The maximum resource tag size of 25 has been exceeded. | Check the cluster information to determine if you can remove any unnecessary tags you have specified and retry cluster installation. |
| OCM5003 | Unable to establish an AWS client to provision the cluster. | You must create several role resources on your AWS account to create and manage a Red Hat OpenShift Service on AWS cluster. Ensure that your provided AWS credentials are correct and retry cluster installation. For more information about Red Hat OpenShift Service on AWS IAM role resources, see ROSA IAM role resources in the Additional resources section. |
| OCM5004 | Unable to establish a cross-account AWS client to provision the cluster. | You must create several role resources on your AWS account to create and manage a Red Hat OpenShift Service on AWS cluster. Ensure that your provided AWS credentials are correct and retry cluster installation. For more information about Red Hat OpenShift Service on AWS IAM role resources, see ROSA IAM role resources in the Additional resources section. |
| OCM5005 | Failed to retrieve AWS subnets defined for the cluster. | Review the provided subnet IDs and retry cluster installation. |
| OCM5006 | You must configure at least one private AWS subnet for the cluster. | Review the provided subnet IDs and retry cluster installation. |
| OCM5007 | Unable to create AWS STS prerequisites for the cluster. | Verify that account and operator roles have been created and are correct. For more information, see AWS STS and ROSA with HCP explained in the Additional resources section. |
| OCM5008 | The provided cluster flavour is incorrect. | Verify that the provided name or ID is correct when you are using the flavour parameter and retry cluster creation. |
| OCM5009 | The cluster version could not be found. | Ensure that the configured version ID matches a valid Red Hat OpenShift Service on AWS version. |
| OCM5010 | Failed to tag subnets for the cluster. | Confirm that the AWS permissions and the subnet configurations are correct. You must tag at least one private subnet and, if applicable, one public subnet. |
| OCM5011 | Cluster installation has failed due to unavailable capacity in the selected region. | Try your cluster installation in another region or retry cluster installation. |
8.2.6. Troubleshooting access to Red Hat Hybrid Cloud Console Copiar o linkLink copiado para a área de transferência!
In Red Hat OpenShift Service on AWS clusters, the Red Hat OpenShift Service on AWS OAuth server is hosted in the Red Hat service’s AWS account while the web console service is published using the cluster’s default ingress controller in the cluster’s AWS account. If you can log in to your cluster using the OpenShift CLI (oc) but cannot access the Red Hat OpenShift Service on AWS web console, verify the following criteria are met:
Procedure
- Verify the console workloads are running.
- Verify the default ingress controller’s load balancer is active.
- Verify you are accessing the console from a machine that has network connectivity to the cluster’s VPC network.
8.2.8. Verifying access to Red Hat OpenShift Service on AWS web console for Red Hat OpenShift Service on AWS cluster in ready state Copiar o linkLink copiado para a área de transferência!
Red Hat OpenShift Service on AWS clusters return a ready status when the control plane hosted in the Red Hat OpenShift Service on AWS service account becomes ready. Cluster console workloads are deployed on the cluster’s worker nodes. The Red Hat OpenShift Service on AWS web console will not be available and accessible until the worker nodes have joined the cluster and console workloads are running.
Procedure
If your Red Hat OpenShift Service on AWS cluster is ready but you are unable to access the Red Hat OpenShift Service on AWS web console for the cluster, wait for the worker nodes to join the cluster and retry accessing the console.
You can either log in to the Red Hat OpenShift Service on AWS cluster or use the
rosa describe machinepoolcommand in therosaCLI watch the nodes.
8.2.10. Verifying access to Red Hat Hybrid Cloud Console for private Red Hat OpenShift Service on AWS clusters Copiar o linkLink copiado para a área de transferência!
The console of the private cluster is private by default. During cluster installation, the default Ingress Controller managed by OpenShift’s Ingress Operator is configured with an internal AWS Network Load Balancer (NLB).
Procedure
-
If your private Red Hat OpenShift Service on AWS cluster shows a
readystatus but you cannot access the Red Hat OpenShift Service on AWS web console for the cluster, try accessing the cluster console from either within the cluster VPC or from a network that is connected to the VPC.
8.3. Troubleshooting networking Copiar o linkLink copiado para a área de transferência!
This document describes how to troubleshoot networking errors.
8.3.1. Connectivity issues on clusters with private Network Load Balancers Copiar o linkLink copiado para a área de transferência!
Red Hat OpenShift Service on AWS clusters created with version 4 deploy AWS Network Load Balancers (NLB) by default for the default ingress controller. In the case of a private NLB, the NLB’s client IP address preservation might cause connections to be dropped where the source and destination are the same host. See the AWS’s documentation about how to Troubleshoot your Network Load Balancer. This IP address preservation has the implication that any customer workloads cohabitating on the same node with the router pods, may not be able send traffic to the private NLB fronting the ingress controller router.
Procedure
- To mitigate this impact, reschedule your workloads onto nodes separate from those where the router pods are scheduled. Alternatively, rely on the internal pod and service networks for accessing other workloads co-located within the same cluster.
8.4. Verifying node health Copiar o linkLink copiado para a área de transferência!
You can verify and troubleshoot node-related issues by reviewing the status, resource usage, and configuration of a node.
8.4.1. Reviewing node status, resource usage, and configuration Copiar o linkLink copiado para a área de transferência!
Review cluster node health status, resource consumption statistics, and node logs. Additionally, query kubelet status on individual nodes.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. -
You have installed the OpenShift CLI (
oc).
Procedure
List the name, status, and role for all nodes in the cluster:
oc get nodes
$ oc get nodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Summarize CPU and memory usage for each node within the cluster:
oc adm top nodes
$ oc adm top nodesCopy to Clipboard Copied! Toggle word wrap Toggle overflow Summarize CPU and memory usage for a specific node:
oc adm top node my-node
$ oc adm top node my-nodeCopy to Clipboard Copied! Toggle word wrap Toggle overflow
8.5. Troubleshooting Operator issues Copiar o linkLink copiado para a área de transferência!
A cluster administrator can do the following to resolve Operator issues: verify Operator subscription status, check Operator pod health, and gather Operator logs.
Operators are a method of packaging, deploying, and managing an Red Hat OpenShift Service on AWS application. They act like an extension of the software vendor’s engineering team, watching over an Red Hat OpenShift Service on AWS environment and using its current state to make decisions in real time. Operators are designed to handle upgrades seamlessly, react to failures automatically, and not take shortcuts, such as skipping a software backup process to save time.
Red Hat OpenShift Service on AWS 4 includes a default set of Operators that are required for proper functioning of the cluster. These default Operators are managed by the Cluster Version Operator (CVO).
As a cluster administrator, you can install application Operators from the software catalog using the Red Hat OpenShift Service on AWS web console or the CLI. You can then subscribe the Operator to one or more namespaces to make it available for developers on your cluster. Application Operators are managed by Operator Lifecycle Manager (OLM).
If you experience Operator issues, verify Operator subscription status. Check Operator pod health across the cluster and gather Operator logs for diagnosis.
8.5.1. Operator subscription condition types Copiar o linkLink copiado para a área de transferência!
Subscriptions can report the following condition types:
| Condition | Description |
|---|---|
|
| Some or all of the catalog sources to be used in resolution are unhealthy. |
|
| An install plan for a subscription is missing. |
|
| An install plan for a subscription is pending installation. |
|
| An install plan for a subscription has failed. |
|
| The dependency resolution for a subscription has failed. |
Default Red Hat OpenShift Service on AWS cluster Operators are managed by the Cluster Version Operator (CVO) and they do not have a Subscription object. Application Operators are managed by Operator Lifecycle Manager (OLM) and they have a Subscription object.
8.5.2. Viewing Operator subscription status by using the CLI Copiar o linkLink copiado para a área de transferência!
You can view Operator subscription status by using the CLI.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. -
You have installed the OpenShift CLI (
oc).
Procedure
List Operator subscriptions:
oc get subs -n <operator_namespace>
$ oc get subs -n <operator_namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
oc describecommand to inspect aSubscriptionresource:oc describe sub <subscription_name> -n <operator_namespace>
$ oc describe sub <subscription_name> -n <operator_namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the command output, find the
Conditionssection for the status of Operator subscription condition types. In the following example, theCatalogSourcesUnhealthycondition type has a status offalsebecause all available catalog sources are healthy:Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteDefault Red Hat OpenShift Service on AWS cluster Operators are managed by the Cluster Version Operator (CVO) and they do not have a
Subscriptionobject. Application Operators are managed by Operator Lifecycle Manager (OLM) and they have aSubscriptionobject.
8.5.3. Viewing Operator catalog source status by using the CLI Copiar o linkLink copiado para a área de transferência!
You can view the status of an Operator catalog source by using the CLI.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. -
You have installed the OpenShift CLI (
oc).
Procedure
List the catalog sources in a namespace. For example, you can check the
openshift-marketplacenamespace, which is used for cluster-wide catalog sources:oc get catalogsources -n openshift-marketplace
$ oc get catalogsources -n openshift-marketplaceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DISPLAY TYPE PUBLISHER AGE certified-operators Certified Operators grpc Red Hat 55m community-operators Community Operators grpc Red Hat 55m example-catalog Example Catalog grpc Example Org 2m25s redhat-operators Red Hat Operators grpc Red Hat 55m
NAME DISPLAY TYPE PUBLISHER AGE certified-operators Certified Operators grpc Red Hat 55m community-operators Community Operators grpc Red Hat 55m example-catalog Example Catalog grpc Example Org 2m25s redhat-operators Red Hat Operators grpc Red Hat 55mCopy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
oc describecommand to get more details and status about a catalog source:oc describe catalogsource example-catalog -n openshift-marketplace
$ oc describe catalogsource example-catalog -n openshift-marketplaceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the preceding example output, the last observed state is
TRANSIENT_FAILURE. This state indicates that there is a problem establishing a connection for the catalog source.List the pods in the namespace where your catalog source was created:
oc get pods -n openshift-marketplace
$ oc get pods -n openshift-marketplaceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When a catalog source is created in a namespace, a pod for the catalog source is created in that namespace. In the preceding example output, the status for the
example-catalog-bwt8zpod isImagePullBackOff. This status indicates that there is an issue pulling the catalog source’s index image.Use the
oc describecommand to inspect a pod for more detailed information:oc describe pod example-catalog-bwt8z -n openshift-marketplace
$ oc describe pod example-catalog-bwt8z -n openshift-marketplaceCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the preceding example output, the error messages indicate that the catalog source’s index image is failing to pull successfully because of an authorization issue. For example, the index image might be stored in a registry that requires login credentials.
8.5.5. Querying Operator pod status Copiar o linkLink copiado para a área de transferência!
You can list Operator pods within a cluster and their status. You can also collect a detailed Operator pod summary.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc).
Procedure
List Operators running in the cluster. The output includes Operator version, availability, and up-time information:
oc get clusteroperators
$ oc get clusteroperatorsCopy to Clipboard Copied! Toggle word wrap Toggle overflow List Operator pods running in the Operator’s namespace, plus pod status, restarts, and age:
oc get pod -n <operator_namespace>
$ oc get pod -n <operator_namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Output a detailed Operator pod summary:
oc describe pod <operator_pod_name> -n <operator_namespace>
$ oc describe pod <operator_pod_name> -n <operator_namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.6. Investigating pod issues Copiar o linkLink copiado para a área de transferência!
Red Hat OpenShift Service on AWS leverages the Kubernetes concept of a pod, which is one or more containers deployed together on one host. A pod is the smallest compute unit that can be defined, deployed, and managed on Red Hat OpenShift Service on AWS 4.
After a pod is defined, it is assigned to run on a node until its containers exit, or until it is removed. Depending on policy and exit code, pods are either removed after exiting or retained so that their logs can be accessed.
The first thing to check when pod issues arise is the pod’s status. If an explicit pod failure has occurred, observe the pod’s error state to identify specific image, container, or pod network issues. Focus diagnostic data collection according to the error state. Review pod event messages, as well as pod and container log information. Diagnose issues dynamically by accessing running Pods on the command line, or start a debug pod with root access based on a problematic pod’s deployment configuration.
8.6.1. Understanding pod error states Copiar o linkLink copiado para a área de transferência!
Pod failures return explicit error states that can be observed in the status field in the output of oc get pods. Pod error states cover image, container, and container network related failures.
The following table provides a list of pod error states along with their descriptions.
| Pod error state | Description |
|---|---|
|
| Generic image retrieval error. |
|
| Image retrieval failed and is backed off. |
|
| The specified image name was invalid. |
|
| Image inspection did not succeed. |
|
|
|
|
| When attempting to retrieve an image from a registry, an HTTP error was encountered. |
|
| The specified container is either not present or not managed by the kubelet, within the declared pod. |
|
| Container initialization failed. |
|
| None of the pod’s containers started successfully. |
|
| None of the pod’s containers were killed successfully. |
|
| A container has terminated. The kubelet will not attempt to restart it. |
|
| A container or image attempted to run with root privileges. |
|
| Pod sandbox creation did not succeed. |
|
| Pod sandbox configuration was not obtained. |
|
| A pod sandbox did not stop successfully. |
|
| Network initialization failed. |
|
| Network termination failed. |
8.6.2. Reviewing pod status Copiar o linkLink copiado para a área de transferência!
You can query pod status and error states. You can also query a pod’s associated deployment configuration and review base image availability.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. -
You have installed the OpenShift CLI (
oc). -
skopeois installed.
Procedure
Switch into a project:
oc project <project_name>
$ oc project <project_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow List pods running within the namespace, as well as pod status, error states, restarts, and age:
oc get pods
$ oc get podsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Determine whether the namespace is managed by a deployment configuration:
oc status
$ oc statusCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the namespace is managed by a deployment configuration, the output includes the deployment configuration name and a base image reference.
Inspect the base image referenced in the preceding command’s output:
skopeo inspect docker://<image_reference>
$ skopeo inspect docker://<image_reference>Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the base image reference is not correct, update the reference in the deployment configuration:
oc edit deployment/my-deployment
$ oc edit deployment/my-deploymentCopy to Clipboard Copied! Toggle word wrap Toggle overflow When deployment configuration changes on exit, the configuration will automatically redeploy. Watch pod status as the deployment progresses, to determine whether the issue has been resolved:
oc get pods -w
$ oc get pods -wCopy to Clipboard Copied! Toggle word wrap Toggle overflow Review events within the namespace for diagnostic information relating to pod failures:
oc get events
$ oc get eventsCopy to Clipboard Copied! Toggle word wrap Toggle overflow
8.6.3. Inspecting pod and container logs Copiar o linkLink copiado para a área de transferência!
You can inspect pod and container logs for warnings and error messages related to explicit pod failures. Depending on policy and exit code, pod and container logs remain available after pods have been terminated.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc).
Procedure
Query logs for a specific pod:
oc logs <pod_name>
$ oc logs <pod_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Query logs for a specific container within a pod:
oc logs <pod_name> -c <container_name>
$ oc logs <pod_name> -c <container_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Logs retrieved using the preceding
oc logscommands are composed of messages sent to stdout within pods or containers.Inspect logs contained in
/var/log/within a pod.List log files and subdirectories contained in
/var/logwithin a pod:oc exec <pod_name> -- ls -alh /var/log
$ oc exec <pod_name> -- ls -alh /var/logCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Query a specific log file contained in
/var/logwithin a pod:oc exec <pod_name> cat /var/log/<path_to_log>
$ oc exec <pod_name> cat /var/log/<path_to_log>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List log files and subdirectories contained in
/var/logwithin a specific container:oc exec <pod_name> -c <container_name> ls /var/log
$ oc exec <pod_name> -c <container_name> ls /var/logCopy to Clipboard Copied! Toggle word wrap Toggle overflow Query a specific log file contained in
/var/logwithin a specific container:oc exec <pod_name> -c <container_name> cat /var/log/<path_to_log>
$ oc exec <pod_name> -c <container_name> cat /var/log/<path_to_log>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.6.4. Accessing running pods Copiar o linkLink copiado para a área de transferência!
You can review running pods dynamically by opening a shell inside a pod or by gaining network access through port forwarding.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc).
Procedure
Switch into the project that contains the pod you would like to access. This is necessary because the
oc rshcommand does not accept the-nnamespace option:oc project <namespace>
$ oc project <namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start a remote shell into a pod:
oc rsh <pod_name>
$ oc rsh <pod_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
<pod_name>-
If a pod has multiple containers,
oc rshdefaults to the first container unless-c <container_name>is specified.
Start a remote shell into a specific container within a pod:
oc rsh -c <container_name> pod/<pod_name>
$ oc rsh -c <container_name> pod/<pod_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a port forwarding session to a port on a pod:
oc port-forward <pod_name> <host_port>:<pod_port>
$ oc port-forward <pod_name> <host_port>:<pod_port>Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
<pod_name> <host_port>:<pod_port>-
Enter
Ctrl+Cto cancel the port forwarding session.
8.6.5. Starting debug pods with root access Copiar o linkLink copiado para a área de transferência!
You can start a debug pod with root access, based on a problematic pod’s deployment or deployment configuration. Pod users typically run with non-root privileges, but running troubleshooting pods with temporary root privileges can be useful during issue investigation.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc).
Procedure
Start a debug pod with root access, based on a deployment.
Obtain a project’s deployment name:
oc get deployment -n <project_name>
$ oc get deployment -n <project_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start a debug pod with root privileges, based on the deployment:
oc debug deployment/my-deployment --as-root -n <project_name>
$ oc debug deployment/my-deployment --as-root -n <project_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Start a debug pod with root access, based on a deployment configuration.
Obtain a project’s deployment configuration name:
oc get deploymentconfigs -n <project_name>
$ oc get deploymentconfigs -n <project_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start a debug pod with root privileges, based on the deployment configuration:
oc debug deploymentconfig/my-deployment-configuration --as-root -n <project_name>
$ oc debug deploymentconfig/my-deployment-configuration --as-root -n <project_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteYou can append
-- <command>to the precedingoc debugcommands to run individual commands within a debug pod, instead of running an interactive shell.
8.6.6. Copying files to and from pods and containers Copiar o linkLink copiado para a área de transferência!
You can copy files to and from a pod to test configuration changes or gather diagnostic information.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc).
Procedure
Copy a file to a pod:
oc cp <local_path> <pod_name>:/<path> -c <container_name>
$ oc cp <local_path> <pod_name>:/<path> -c <container_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -c <container_name>refers to the desired container in a pod. If you do not specify a container with the-coption, then the first container in a pod is selected.Copy a file from a pod:
oc cp <pod_name>:/<path> -c <container_name> <local_path>
$ oc cp <pod_name>:/<path> -c <container_name> <local_path>Copy to Clipboard Copied! Toggle word wrap Toggle overflow -c <container_name>refers to the desired container in a pod. If you do not specify a container with the-coption, then the first container in a pod is selected.NoteFor
oc cpto function, thetarbinary must be available within the container.
8.7. Troubleshooting the Source-to-Image process Copiar o linkLink copiado para a área de transferência!
A cluster administrator can observe the S2I stages to determine where in the S2I process a failure occurred and gather diagnostic data to resolve Source-to-Image issues.
8.7.1. Strategies for Source-to-Image troubleshooting Copiar o linkLink copiado para a área de transferência!
Use Source-to-Image (S2I) to build reproducible, Docker-formatted container images. You can create ready-to-run images by injecting application source code into a container image and assembling a new image. The new image incorporates the base image (the builder) and built source.
Procedure
To determine where in the S2I process a failure occurs, you can observe the state of the pods relating to each of the following S2I stages:
- During the build configuration stage, a build pod is used to create an application container image from a base image and application source code.
- During the deployment configuration stage, a deployment pod is used to deploy application pods from the application container image that was built in the build configuration stage. The deployment pod also deploys other resources such as services and routes. The deployment configuration begins after the build configuration succeeds.
-
After the deployment pod has started the application pods, application failures can occur within the running application pods. For instance, an application might not behave as expected even though the application pods are in a
Runningstate. In this scenario, you can access running application pods to investigate application failures within a pod.
When troubleshooting S2I issues, follow this strategy:
- Monitor build, deployment, and application pod status.
- Determine the stage of the S2I process where the problem occurred.
- Review logs corresponding to the failed stage.
8.7.2. Gathering Source-to-Image diagnostic data Copiar o linkLink copiado para a área de transferência!
The S2I tool runs a build pod and a deployment pod in sequence. The deployment pod is responsible for deploying the application pods based on the application container image created in the build stage. Watch build, deployment and application pod status to determine where in the S2I process a failure occurs. Then, focus diagnostic data collection accordingly.
Prerequisites
-
You have access to the cluster as a user with the
cluster-adminrole. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc).
Procedure
Watch the pod status throughout the S2I process to determine at which stage a failure occurs:
oc get pods -w
$ oc get pods -wCopy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
-wflag to monitor pods for changes until you quit the command usingCtrl+C.Review a failed pod’s logs for errors.
If the build pod fails, review the build pod’s logs:
oc logs -f pod/<application_name>-<build_number>-build
$ oc logs -f pod/<application_name>-<build_number>-buildCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteAlternatively, you can review the build configuration’s logs using
oc logs -f bc/<application_name>. The build configuration’s logs include the logs from the build pod.If the deployment pod fails, review the deployment pod’s logs:
oc logs -f pod/<application_name>-<build_number>-deploy
$ oc logs -f pod/<application_name>-<build_number>-deployCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteAlternatively, you can review the deployment configuration’s logs using
oc logs -f dc/<application_name>. This outputs logs from the deployment pod until the deployment pod completes successfully. The command outputs logs from the application pods if you run it after the deployment pod has completed. After a deployment pod completes, its logs can still be accessed by runningoc logs -f pod/<application_name>-<build_number>-deploy.If an application pod fails, or if an application is not behaving as expected within a running application pod, review the application pod’s logs:
oc logs -f pod/<application_name>-<build_number>-<random_string>
$ oc logs -f pod/<application_name>-<build_number>-<random_string>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.7.3. Gathering application diagnostic data to investigate application failures Copiar o linkLink copiado para a área de transferência!
Application failures can occur within running application pods. In these situations, you can retrieve diagnostic information with these strategies:
- Review events relating to the application pods.
- Review the logs from the application pods, including application-specific log files that are not collected by the OpenShift Logging framework.
- Test application functionality interactively and run diagnostic tools in an application container.
Prerequisites
-
You have access to the cluster as a user with the
cluster-adminrole. -
You have installed the OpenShift CLI (
oc).
Procedure
List events relating to a specific application pod. The following example retrieves events for an application pod named
my-app-1-akdlg:oc describe pod/my-app-1-akdlg
$ oc describe pod/my-app-1-akdlgCopy to Clipboard Copied! Toggle word wrap Toggle overflow Review logs from an application pod:
oc logs -f pod/my-app-1-akdlg
$ oc logs -f pod/my-app-1-akdlgCopy to Clipboard Copied! Toggle word wrap Toggle overflow Query specific logs within a running application pod. Logs that are sent to stdout are collected by the OpenShift Logging framework and are included in the output of the preceding command. The following query is only required for logs that are not sent to stdout.
If an application log can be accessed without root privileges within a pod, concatenate the log file as follows:
oc exec my-app-1-akdlg -- cat /var/log/my-application.log
$ oc exec my-app-1-akdlg -- cat /var/log/my-application.logCopy to Clipboard Copied! Toggle word wrap Toggle overflow If root access is required to view an application log, you can start a debug container with root privileges and then view the log file from within the container. Start the debug container from the project’s
DeploymentConfigobject. Pod users typically run with non-root privileges, but running troubleshooting pods with temporary root privileges can be useful during issue investigation:oc debug dc/my-deployment-configuration --as-root -- cat /var/log/my-application.log
$ oc debug dc/my-deployment-configuration --as-root -- cat /var/log/my-application.logCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteYou can access an interactive shell with root access within the debug pod if you run
oc debug dc/<deployment_configuration> --as-rootwithout appending-- <command>.
Test application functionality interactively and run diagnostic tools, in an application container with an interactive shell.
Start an interactive shell on the application container:
oc exec -it my-app-1-akdlg /bin/bash
$ oc exec -it my-app-1-akdlg /bin/bashCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Test application functionality interactively from within the shell. For example, you can run the container’s entry point command and observe the results. Then, test changes from the command line directly, before updating the source code and rebuilding the application container through the S2I process.
Run diagnostic binaries available within the container.
NoteRoot privileges are required to run some diagnostic binaries. In these situations you can start a debug pod with root access, based on a problematic pod’s
DeploymentConfigobject, by runningoc debug dc/<deployment_configuration> --as-root. Then, you can run diagnostic binaries as root from within the debug pod.
If diagnostic binaries are not available within a container, you can run a host’s diagnostic binaries within a container’s namespace by using
nsenter. The following example runsip adwithin a container’s namespace, using the host`sipbinary.Enter into a debug session on the target node. This step instantiates a debug pod called
<node_name>-debug:oc debug node/my-cluster-node
$ oc debug node/my-cluster-nodeCopy to Clipboard Copied! Toggle word wrap Toggle overflow Set
/hostas the root directory within the debug shell. The debug pod mounts the host’s root file system in/hostwithin the pod. By changing the root directory to/host, you can run binaries contained in the host’s executable paths:chroot /host
# chroot /hostCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteRed Hat OpenShift Service on AWS 4 cluster nodes running Red Hat Enterprise Linux CoreOS (RHCOS) are immutable and rely on Operators to apply cluster changes. Accessing cluster nodes by using SSH is not recommended. However, if the Red Hat OpenShift Service on AWS API is not available, or the kubelet is not properly functioning on the target node,
ocoperations will be impacted. In such situations, it is possible to access nodes usingssh core@<node>.<cluster_name>.<base_domain>instead.Determine the target container ID:
crictl ps
# crictl psCopy to Clipboard Copied! Toggle word wrap Toggle overflow Determine the container’s process ID. In this example, the target container ID is
a7fe32346b120:crictl inspect a7fe32346b120 --output yaml | grep 'pid:' | awk '{print $2}'# crictl inspect a7fe32346b120 --output yaml | grep 'pid:' | awk '{print $2}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run
ip adwithin the container’s namespace, using the host’sipbinary. This example uses31150as the container’s process ID. Thensentercommand enters the namespace of a target process and runs a command in its namespace. Because the target process in this example is a container’s process ID, theip adcommand is run in the container’s namespace from the host:nsenter -n -t 31150 -- ip ad
# nsenter -n -t 31150 -- ip adCopy to Clipboard Copied! Toggle word wrap Toggle overflow NoteRunning a host’s diagnostic binaries within a container’s namespace is only possible if you are using a privileged container such as a debug node.
8.8. Troubleshooting storage issues Copiar o linkLink copiado para a área de transferência!
A multi-attach storage error occurs when the mounting volume on a new node is not possible because the failed node cannot unmount the attached volume. A cluster administrator can resolve multi-attach storage issues by enabling multiple attachments using RWX volumes or recovering/deleting the failed node when using an RWO volume.
8.8.1. Resolving multi-attach errors Copiar o linkLink copiado para a área de transferência!
When a node crashes or shuts down abruptly, the attached ReadWriteOnce (RWO) volume is expected to be unmounted from the node so that it can be used by a pod scheduled on another node.
However, mounting on a new node is not possible because the failed node is unable to unmount the attached volume.
A multi-attach error is reported:
Example output
Unable to attach or mount volumes: unmounted volumes=[sso-mysql-pvol], unattached volumes=[sso-mysql-pvol default-token-x4rzc]: timed out waiting for the condition Multi-Attach error for volume "pvc-8837384d-69d7-40b2-b2e6-5df86943eef9" Volume is already used by pod(s) sso-mysql-1-ns6b4
Unable to attach or mount volumes: unmounted volumes=[sso-mysql-pvol], unattached volumes=[sso-mysql-pvol default-token-x4rzc]: timed out waiting for the condition
Multi-Attach error for volume "pvc-8837384d-69d7-40b2-b2e6-5df86943eef9" Volume is already used by pod(s) sso-mysql-1-ns6b4
Procedure
To resolve the multi-attach issue, use one of the following solutions:
Enable multiple attachments by using RWX volumes:
For most storage solutions, you can use ReadWriteMany (RWX) volumes to prevent multi-attach errors.
Recover or delete the failed node when using an RWO volume:
For storage that does not support RWX, such as VMware vSphere, RWO volumes must be used instead. However, RWO volumes cannot be mounted on multiple nodes.
If you encounter a multi-attach error message with an RWO volume, force delete the pod on a shutdown or crashed node to avoid data loss in critical workloads, such as when dynamic persistent volumes are attached:
oc delete pod <old_pod> --force=true --grace-period=0
$ oc delete pod <old_pod> --force=true --grace-period=0Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command deletes the volumes stuck on shutdown or crashed nodes after six minutes.
8.9. Investigating monitoring issues Copiar o linkLink copiado para a área de transferência!
Red Hat OpenShift Service on AWS includes a preconfigured, preinstalled, and self-updating monitoring stack that provides monitoring for core platform components. In Red Hat OpenShift Service on AWS 4, cluster administrators can optionally enable monitoring for user-defined projects.
Use these procedures if the following issues occur:
- Your own metrics are unavailable.
- Prometheus is consuming a lot of disk space.
-
The
KubePersistentVolumeFillingUpalert is firing for Prometheus.
8.9.2. Determining why Prometheus is consuming a lot of disk space Copiar o linkLink copiado para a área de transferência!
Developers can create labels to define attributes for metrics in the form of key-value pairs. The number of potential key-value pairs corresponds to the number of possible values for an attribute. An attribute that has an unlimited number of potential values is called an unbound attribute. For example, a customer_id attribute is unbound because it has an infinite number of possible values.
Every assigned key-value pair has a unique time series. The use of many unbound attributes in labels can result in an exponential increase in the number of time series created. This can impact Prometheus performance and can consume a lot of disk space.
You can use the following measures when Prometheus consumes a lot of disk:
- Check the time series database (TSDB) status using the Prometheus HTTP API for more information about which labels are creating the most time series data. Doing so requires cluster administrator privileges.
- Check the number of scrape samples that are being collected.
Reduce the number of unique time series that are created by reducing the number of unbound attributes that are assigned to user-defined metrics.
NoteUsing attributes that are bound to a limited set of possible values reduces the number of potential key-value pair combinations.
- Enforce limits on the number of samples that can be scraped across user-defined projects. This requires cluster administrator privileges.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-adminrole. -
You have installed the OpenShift CLI (
oc).
Procedure
- In the Red Hat OpenShift Service on AWS web console, go to Observe → Metrics.
Enter a Prometheus Query Language (PromQL) query in the Expression field. The following example queries help to identify high cardinality metrics that might result in high disk space consumption:
By running the following query, you can identify the ten jobs that have the highest number of scrape samples:
topk(10, max by(namespace, job) (topk by(namespace, job) (1, scrape_samples_post_metric_relabeling)))
topk(10, max by(namespace, job) (topk by(namespace, job) (1, scrape_samples_post_metric_relabeling)))Copy to Clipboard Copied! Toggle word wrap Toggle overflow By running the following query, you can pinpoint time series churn by identifying the ten jobs that have created the most time series data in the last hour:
topk(10, sum by(namespace, job) (sum_over_time(scrape_series_added[1h])))
topk(10, sum by(namespace, job) (sum_over_time(scrape_series_added[1h])))Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Investigate the number of unbound label values assigned to metrics with higher than expected scrape sample counts:
- If the metrics relate to a user-defined project, review the metrics key-value pairs assigned to your workload. These are implemented through Prometheus client libraries at the application level. Try to limit the number of unbound attributes referenced in your labels.
- If the metrics relate to a core Red Hat OpenShift Service on AWS project, create a Red Hat support case on the Red Hat Customer Portal.
Review the TSDB status using the Prometheus HTTP API by following these steps when logged in as a
dedicated-admin:Get the Prometheus API route URL by running the following command:
HOST=$(oc -n openshift-monitoring get route prometheus-k8s -ojsonpath='{.status.ingress[].host}')$ HOST=$(oc -n openshift-monitoring get route prometheus-k8s -ojsonpath='{.status.ingress[].host}')Copy to Clipboard Copied! Toggle word wrap Toggle overflow Extract an authentication token by running the following command:
TOKEN=$(oc whoami -t)
$ TOKEN=$(oc whoami -t)Copy to Clipboard Copied! Toggle word wrap Toggle overflow Query the TSDB status for Prometheus by running the following command:
curl -H "Authorization: Bearer $TOKEN" -k "https://$HOST/api/v1/status/tsdb"
$ curl -H "Authorization: Bearer $TOKEN" -k "https://$HOST/api/v1/status/tsdb"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.10. Diagnosing OpenShift CLI (oc) issues Copiar o linkLink copiado para a área de transferência!
You can investigate OpenShift CLI (oc) issues by increasing the log level to get more detailed diagnostic information.
8.10.1. Understanding OpenShift CLI (oc) log levels Copiar o linkLink copiado para a área de transferência!
With the OpenShift CLI (oc), you can create applications and manage Red Hat OpenShift Service on AWS projects from a terminal.
If oc command-specific issues arise, increase the oc log level to output API request, API response, and curl request details generated by the command. This provides a granular view of a particular oc command’s underlying operation, which in turn might provide insight into the nature of a failure.
oc log levels range from 1 to 10. The following table provides a list of oc log levels, along with their descriptions.
| Log level | Description |
|---|---|
| 1 to 5 | No additional logging to stderr. |
| 6 | Log API requests to stderr. |
| 7 | Log API requests and headers to stderr. |
| 8 | Log API requests, headers, and body, plus API response headers and body to stderr. |
| 9 |
Log API requests, headers, and body, API response headers and body, plus |
| 10 |
Log API requests, headers, and body, API response headers and body, plus |
8.10.2. Specifying OpenShift CLI (oc) log levels Copiar o linkLink copiado para a área de transferência!
You can investigate OpenShift CLI (oc) issues by increasing the command’s log level.
The Red Hat OpenShift Service on AWS user’s current session token is typically included in logged curl requests where required. You can also obtain the current user’s session token manually, for use when testing aspects of an oc command’s underlying process step-by-step.
Prerequisites
-
Install the OpenShift CLI (
oc).
Procedure
Specify the
oclog level when running anoccommand:oc <command> --loglevel <log_level>
$ oc <command> --loglevel <log_level>Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
- <command>
- Specifies the command you are running.
- <log_level>
- Specifies the log level to apply to the command.
To obtain the current user’s session token, run the following command:
oc whoami -t
$ oc whoami -tCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
sha256~RCV3Qcn7H-OEfqCGVI0CvnZ6...
sha256~RCV3Qcn7H-OEfqCGVI0CvnZ6...Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.11. Troubleshooting expired tokens Copiar o linkLink copiado para a área de transferência!
You can troubleshoot expired offline access tokens that prevent access to your Red Hat OpenShift Service on AWS cluster.
8.11.1. Troubleshooting expired offline access tokens Copiar o linkLink copiado para a área de transferência!
If you use the Red Hat OpenShift Service on AWS (ROSA) CLI, rosa, and your api.openshift.com offline access token expires, an error message appears. This happens when sso.redhat.com invalidates the token.
Example output
Can't get tokens .... Can't get access tokens ....
Can't get tokens ....
Can't get access tokens ....
Procedure
- Generate a new offline access token at the following URL. A new offline access token is generated every time you visit the OpenShift Cluster Manager URL.
8.12. Troubleshooting IAM roles Copiar o linkLink copiado para a área de transferência!
You can troubleshoot IAM role issues that prevent proper access to your Red Hat OpenShift Service on AWS cluster resources.
8.12.1. Resolving issues with ocm-roles and user-role IAM resources Copiar o linkLink copiado para a área de transferência!
You may receive an error when trying to create a cluster using the ROSA command-line interface (CLI) (rosa). This error means that the user-role IAM role is not linked to your AWS account. The most likely cause of this error is that another user in your Red Hat organization created the ocm-role IAM role. Your user-role IAM role needs to be created.
After any user sets up an ocm-role IAM resource linked to a Red Hat account, any subsequent users wishing to create a cluster in that Red Hat organization must have a user-role IAM role to provision a cluster.
Procedure
Assess the status of your
ocm-roleanduser-roleIAM roles with the following commands:rosa list ocm-role
$ rosa list ocm-roleCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
I: Fetching ocm roles ROLE NAME ROLE ARN LINKED ADMIN ManagedOpenShift-OCM-Role-1158 arn:aws:iam::2066:role/ManagedOpenShift-OCM-Role-1158 No No
I: Fetching ocm roles ROLE NAME ROLE ARN LINKED ADMIN ManagedOpenShift-OCM-Role-1158 arn:aws:iam::2066:role/ManagedOpenShift-OCM-Role-1158 No NoCopy to Clipboard Copied! Toggle word wrap Toggle overflow rosa list user-role
$ rosa list user-roleCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
I: Fetching user roles ROLE NAME ROLE ARN LINKED ManagedOpenShift-User.osdocs-Role arn:aws:iam::2066:role/ManagedOpenShift-User.osdocs-Role Yes
I: Fetching user roles ROLE NAME ROLE ARN LINKED ManagedOpenShift-User.osdocs-Role arn:aws:iam::2066:role/ManagedOpenShift-User.osdocs-Role YesCopy to Clipboard Copied! Toggle word wrap Toggle overflow With the results of these commands, you can create and link the missing IAM resources.
8.12.1.1. Creating an ocm-role IAM role Copiar o linkLink copiado para a área de transferência!
You create your ocm-role IAM roles by using the ROSA command-line interface (CLI) (rosa).
Prerequisites
- You have an AWS account.
- You have Red Hat Organization Administrator privileges in the OpenShift Cluster Manager organization.
- You have the permissions required to install AWS account-wide roles.
-
You have installed and configured the latest ROSA CLI,
rosa, on your installation host.
Procedure
To create an ocm-role IAM role with basic privileges, run the following command:
rosa create ocm-role
$ rosa create ocm-roleCopy to Clipboard Copied! Toggle word wrap Toggle overflow To create an ocm-role IAM role with admin privileges, run the following command:
rosa create ocm-role --admin
$ rosa create ocm-role --adminCopy to Clipboard Copied! Toggle word wrap Toggle overflow This command allows you to create the role by specifying specific attributes. The following example output shows the "auto mode" selected, which lets the ROSA CLI (
rosa) create your Operator roles and policies. See "Methods of account-wide role creation" for more information. The following example shows what your creation flow may look like.Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
Role prefix-
A prefix value for all of the created AWS resources. In this example,
ManagedOpenShiftprepends all of the AWS resources. Enable admin capabilities for the OCM role (optional)Choose if you want this role to have the additional admin permissions.
NoteYou do not see this prompt if you used the
--adminoption.Permissions boundary ARN (optional)- The Amazon Resource Name (ARN) of the policy to set permission boundaries.
Role Path (optional)- Specify an IAM path for the user name.
Role creation mode-
Choose the method to create your AWS roles. Using
auto, the ROSA CLI generates and links the roles and policies. In theautomode, you receive some different prompts to create the AWS roles. Create the 'ManagedOpenShift-OCM-Role-182' role?-
The
automethod asks if you want to create a specificocm-roleusing your prefix. OCM Role ARN- Confirm that you want to associate your IAM role with your OpenShift Cluster Manager.
Link the 'arn:aws:iam::<ARN>:role/ManagedOpenShift-OCM-Role-182' role with organization '<AWS ARN>'?- Links the created role with your AWS organization.
8.13. Troubleshooting Red Hat OpenShift Service on AWS cluster deployments Copiar o linkLink copiado para a área de transferência!
This document describes how to troubleshoot cluster deployment errors.
8.13.1. Obtaining information about a failed cluster Copiar o linkLink copiado para a área de transferência!
If a cluster deployment fails, the cluster is put into an "error" state.
Procedure
Run the following command to get more information:
rosa describe cluster -c <my_cluster_name> --debug
$ rosa describe cluster -c <my_cluster_name> --debugCopy to Clipboard Copied! Toggle word wrap Toggle overflow
8.13.2. Troubleshooting cluster creation with an osdCcsAdmin error Copiar o linkLink copiado para a área de transferência!
If a cluster creation action fails, you might receive the following error message.
Example output
Failed to create cluster: Unable to create cluster spec: Failed to get access keys for user 'osdCcsAdmin': NoSuchEntity: The user with name osdCcsAdmin cannot be found.
Failed to create cluster: Unable to create cluster spec: Failed to get access keys for user 'osdCcsAdmin': NoSuchEntity: The user with name osdCcsAdmin cannot be found.
Procedure
Delete the stack:
rosa init --delete
$ rosa init --deleteCopy to Clipboard Copied! Toggle word wrap Toggle overflow Reinitialize your account:
rosa init
$ rosa initCopy to Clipboard Copied! Toggle word wrap Toggle overflow
8.13.3. Creating the Elastic Load Balancing (ELB) service-linked role Copiar o linkLink copiado para a área de transferência!
If you have not created a load balancer in your AWS account, it is possible that the service-linked role for Elastic Load Balancing (ELB) might not exist yet. You may receive the following error:
Example output
Error: Error creating network Load Balancer: AccessDenied: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/ManagedOpenShift-Installer-Role/xxxxxxxxxxxxxxxxxxx is not authorized to perform: iam:CreateServiceLinkedRole on resource: arn:aws:iam::xxxxxxxxxxxx:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing"
Error: Error creating network Load Balancer: AccessDenied: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/ManagedOpenShift-Installer-Role/xxxxxxxxxxxxxxxxxxx is not authorized to perform: iam:CreateServiceLinkedRole on resource: arn:aws:iam::xxxxxxxxxxxx:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing"
Procedure
To resolve this issue, ensure that the role exists on your AWS account. If not, create this role with the following command:
aws iam get-role --role-name "AWSServiceRoleForElasticLoadBalancing" || aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com"
aws iam get-role --role-name "AWSServiceRoleForElasticLoadBalancing" || aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com"Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThis command only needs to be executed once per account.
8.13.4. Repairing a cluster that cannot be deleted Copiar o linkLink copiado para a área de transferência!
In specific cases, the following error appears in OpenShift Cluster Manager if you attempt to delete your cluster.
Example output
Error deleting cluster CLUSTERS-MGMT-400: Failed to delete cluster <hash>: sts_user_role is not linked to your account. sts_ocm_role is linked to your organization <org number> which requires sts_user_role to be linked to your Red Hat account <account ID>.Please create a user role and link it to the account: User Account <account ID> is not authorized to perform STS cluster operations Operation ID: b0572d6e-fe54-499b-8c97-46bf6890011c
Error deleting cluster
CLUSTERS-MGMT-400: Failed to delete cluster <hash>: sts_user_role is not linked to your account. sts_ocm_role is linked to your organization <org number> which requires sts_user_role to be linked to your Red Hat account <account ID>.Please create a user role and link it to the account: User Account <account ID> is not authorized to perform STS cluster operations
Operation ID: b0572d6e-fe54-499b-8c97-46bf6890011c
If you try to delete your cluster from the CLI, the following error appears.
Example output
E: Failed to delete cluster <hash>: sts_user_role is not linked to your account. sts_ocm_role is linked to your organization <org_number> which requires sts_user_role to be linked to your Red Hat account <account_id>.Please create a user role and link it to the account: User Account <account ID> is not authorized to perform STS cluster operations
E: Failed to delete cluster <hash>: sts_user_role is not linked to your account. sts_ocm_role is linked to your organization <org_number> which requires sts_user_role to be linked to your Red Hat account <account_id>.Please create a user role and link it to the account: User Account <account ID> is not authorized to perform STS cluster operations
This error occurs when the user-role is unlinked or deleted.
Procedure
Run the following command to create the
user-roleIAM resource:rosa create user-role
$ rosa create user-roleCopy to Clipboard Copied! Toggle word wrap Toggle overflow After you see that the role has been created, you can delete the cluster. The following confirms that the role was created and linked:
I: Successfully linked role ARN <user role ARN> with account <account ID>
I: Successfully linked role ARN <user role ARN> with account <account ID>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Legal Notice
Copiar o linkLink copiado para a área de transferência!
Copyright © 2025 Red Hat
OpenShift documentation is licensed under the Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
Modified versions must remove all Red Hat trademarks.
Portions adapted from https://github.com/kubernetes-incubator/service-catalog/ with modifications by Red Hat.
Red Hat, Red Hat Enterprise Linux, the Red Hat logo, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of Joyent. Red Hat Software Collections is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.