Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 8. Troubleshooting
8.1. Review your cluster notifications Link kopierenLink in die Zwischenablage kopiert!
When you are trying to resolve a problem with your cluster, your cluster notifications are a good source of information.
Cluster notifications are messages about the status, health, or performance of your cluster. They are also the primary way that Red Hat Site Reliability Engineering (SRE) communicates with you about cluster health and resolving problems with your cluster.
8.1.1. Viewing cluster notifications using the Red Hat Hybrid Cloud Console Link kopierenLink in die Zwischenablage kopiert!
Cluster notifications provide important information about the health of your cluster. You can view notifications that have been sent to your cluster in the Cluster history tab on the Red Hat Hybrid Cloud Console.
Prerequisites
- You are logged in to the Hybrid Cloud Console.
Procedure
- Navigate to the Clusters page of the Hybrid Cloud Console.
- Click the name of your cluster to go to the cluster details page.
Click the Cluster history tab.
Cluster notifications appear under the Cluster history heading.
Optional: Filter for relevant cluster notifications
Use the filter controls to hide cluster notifications that are not relevant to you, so that you can focus on your area of expertise or on resolving a critical issue. You can filter notifications based on text in the notification description, severity level, notification type, when the notification was received, and which system or person triggered the notification.
8.2. Troubleshooting Red Hat OpenShift Service on AWS cluster installations Link kopierenLink in die Zwischenablage kopiert!
For help with the installation of Red Hat OpenShift Service on AWS clusters, see the following sections.
8.2.1. Installation troubleshooting Link kopierenLink in die Zwischenablage kopiert!
8.2.1.1. Inspect install or uninstall logs Link kopierenLink in die Zwischenablage kopiert!
To display install logs:
Run the following command, replacing
<cluster_name>
with the name of your cluster:rosa logs install --cluster=<cluster_name>
$ rosa logs install --cluster=<cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To watch the logs, include the
--watch
flag:rosa logs install --cluster=<cluster_name> --watch
$ rosa logs install --cluster=<cluster_name> --watch
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
To display uninstall logs:
Run the following command, replacing
<cluster_name>
with the name of your cluster:rosa logs uninstall --cluster=<cluster_name>
$ rosa logs uninstall --cluster=<cluster_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To watch the logs, include the
--watch
flag:rosa logs uninstall --cluster=<cluster_name> --watch
$ rosa logs uninstall --cluster=<cluster_name> --watch
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.2.1.2. Verify your AWS account and quota Link kopierenLink in die Zwischenablage kopiert!
Run the following command to verify you have the available quota on your AWS account:
rosa verify quota
$ rosa verify quota
AWS quotas change based on region. Be sure you are verifying your quota for the correct AWS region. If you need to increase your quota, navigate to your AWS console, and request a quota increase for the service that failed.
8.2.1.3. AWS notification emails Link kopierenLink in die Zwischenablage kopiert!
When creating a cluster, the Red Hat OpenShift Service on AWS service creates small instances in all supported regions. This check ensures the AWS account being used can deploy to each supported region.
For AWS accounts that are not using all supported regions, AWS may send one or more emails confirming that "Your Request For Accessing AWS Resources Has Been Validated". Typically the sender of this email is aws-verification@amazon.com.
This is expected behavior as the Red Hat OpenShift Service on AWS service is validating your AWS account configuration.
8.2.2. Verifying installation of Red Hat OpenShift Service on AWS clusters Link kopierenLink in die Zwischenablage kopiert!
If the ROSA with HCP cluster is in the installing state for over 30 minutes and has not become ready, ensure the AWS account environment is prepared for the required cluster configurations. If the AWS account environment is prepared for the required cluster configurations correctly, try to delete and recreate the cluster. If the problem persists, contact support.
8.2.3. Troubleshooting Red Hat OpenShift Service on AWS installation error codes Link kopierenLink in die Zwischenablage kopiert!
The following table lists Red Hat OpenShift Service on AWS installation error codes and what you can do to troubleshoot these errors.
Error code | Description | Resolution |
---|---|---|
OCM3999 | Unknown error. | Check the cluster installation logs for more details, or delete this cluster and retry cluster installation. If this issue persists, contact support by logging in to the Customer Support page. |
OCM5001 | Red Hat OpenShift Service on AWS cluster provision has failed. | Check the cluster installation logs for more details, or delete this cluster and retry cluster installation. If this issue persists, contact support by logging in to the Customer Support page. |
OCM5002 | The maximum resource tag size of 25 has been exceeded. | Check the cluster information to determine if you can remove any unnecessary tags you have specified and retry cluster installation. |
OCM5003 | Unable to establish an AWS client to provision the cluster. | You must create several role resources on your AWS account to create and manage a Red Hat OpenShift Service on AWS cluster. Ensure that your provided AWS credentials are correct and retry cluster installation. For more information about Red Hat OpenShift Service on AWS IAM role resources, see ROSA IAM role resources in the Additional resources section. |
OCM5004 | Unable to establish a cross-account AWS client to provision the cluster. | You must create several role resources on your AWS account to create and manage a Red Hat OpenShift Service on AWS cluster. Ensure that your provided AWS credentials are correct and retry cluster installation. For more information about Red Hat OpenShift Service on AWS IAM role resources, see ROSA IAM role resources in the Additional resources section. |
OCM5005 | Failed to retrieve AWS subnets defined for the cluster. | Review the provided subnet IDs and retry cluster installation. |
OCM5006 | You must configure at least one private AWS subnet for the cluster. | Review the provided subnet IDs and retry cluster installation. |
OCM5007 | Unable to create AWS STS prerequisites for the cluster. | Verify that account and operator roles have been created and are correct. For more information, see AWS STS and ROSA with HCP explained in the Additional resources section. |
OCM5008 | The provided cluster flavour is incorrect. | Verify that the provided name or ID is correct when you are using the flavour parameter and retry cluster creation. |
OCM5009 | The cluster version could not be found. | Ensure that the configured version ID matches a valid Red Hat OpenShift Service on AWS version. |
OCM5010 | Failed to tag subnets for the cluster. | Confirm that the AWS permissions and the subnet configurations are correct. You must tag at least one private subnet and, if applicable, one public subnet. |
OCM5011 | Cluster installation has failed due to unavailable capacity in the selected region. | Try your cluster installation in another region or retry cluster installation. |
8.2.4. Troubleshooting access to Red Hat Hybrid Cloud Console Link kopierenLink in die Zwischenablage kopiert!
In Red Hat OpenShift Service on AWS clusters, the Red Hat OpenShift Service on AWS OAuth server is hosted in the Red Hat service’s AWS account while the web console service is published using the cluster’s default ingress controller in the cluster’s AWS account. If you can log in to your cluster using the OpenShift CLI (oc) but cannot access the Red Hat OpenShift Service on AWS web console, verify the following criteria are met:
- The console workloads are running.
- The default ingress controller’s load balancer is active.
- You are accessing the console from a machine that has network connectivity to the cluster’s VPC network.
8.2.5. Verifying access to Red Hat OpenShift Service on AWS web console for Red Hat OpenShift Service on AWS cluster in ready state Link kopierenLink in die Zwischenablage kopiert!
Red Hat OpenShift Service on AWS clusters return a ready
status when the control plane hosted in the Red Hat OpenShift Service on AWS service account becomes ready. Cluster console workloads are deployed on the cluster’s worker nodes. The Red Hat OpenShift Service on AWS web console will not be available and accessible until the worker nodes have joined the cluster and console workloads are running.
If your Red Hat OpenShift Service on AWS cluster is ready but you are unable to access the Red Hat OpenShift Service on AWS web console for the cluster, wait for the worker nodes to join the cluster and retry accessing the console.
You can either log in to the Red Hat OpenShift Service on AWS cluster or use the rosa describe machinepool
command in the rosa
CLI watch the nodes.
8.2.6. Verifying access to Red Hat Hybrid Cloud Console for private Red Hat OpenShift Service on AWS clusters Link kopierenLink in die Zwischenablage kopiert!
The console of the private cluster is private by default. During cluster installation, the default Ingress Controller managed by OpenShift’s Ingress Operator is configured with an internal AWS Network Load Balancer (NLB).
If your private Red Hat OpenShift Service on AWS cluster shows a ready
status but you cannot access the Red Hat OpenShift Service on AWS web console for the cluster, try accessing the cluster console from either within the cluster VPC or from a network that is connected to the VPC.
8.3. Troubleshooting networking Link kopierenLink in die Zwischenablage kopiert!
This document describes how to troubleshoot networking errors.
8.3.1. Connectivity issues on clusters with private Network Load Balancers Link kopierenLink in die Zwischenablage kopiert!
Red Hat OpenShift Service on AWS clusters created with version 4 deploy AWS Network Load Balancers (NLB) by default for the default
ingress controller. In the case of a private NLB, the NLB’s client IP address preservation might cause connections to be dropped where the source and destination are the same host. See the AWS’s documentation about how to Troubleshoot your Network Load Balancer. This IP address preservation has the implication that any customer workloads cohabitating on the same node with the router pods, may not be able send traffic to the private NLB fronting the ingress controller router.
To mitigate this impact, customers should reschedule their workloads onto nodes separate from those where the router pods are scheduled. Alternatively, customers should rely on the internal pod and service networks for accessing other workloads co-located within the same cluster.
8.4. Verifying node health Link kopierenLink in die Zwischenablage kopiert!
8.4.1. Reviewing node status, resource usage, and configuration Link kopierenLink in die Zwischenablage kopiert!
Review cluster node health status, resource consumption statistics, and node logs. Additionally, query kubelet
status on individual nodes.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-admin
role. -
You have installed the OpenShift CLI (
oc
).
Procedure
List the name, status, and role for all nodes in the cluster:
oc get nodes
$ oc get nodes
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Summarize CPU and memory usage for each node within the cluster:
oc adm top nodes
$ oc adm top nodes
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Summarize CPU and memory usage for a specific node:
oc adm top node my-node
$ oc adm top node my-node
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.5. Troubleshooting Operator issues Link kopierenLink in die Zwischenablage kopiert!
Operators are a method of packaging, deploying, and managing an Red Hat OpenShift Service on AWS application. They act like an extension of the software vendor’s engineering team, watching over an Red Hat OpenShift Service on AWS environment and using its current state to make decisions in real time. Operators are designed to handle upgrades seamlessly, react to failures automatically, and not take shortcuts, such as skipping a software backup process to save time.
Red Hat OpenShift Service on AWS 4 includes a default set of Operators that are required for proper functioning of the cluster. These default Operators are managed by the Cluster Version Operator (CVO).
As a cluster administrator, you can install application Operators from the OperatorHub using the Red Hat OpenShift Service on AWS web console or the CLI. You can then subscribe the Operator to one or more namespaces to make it available for developers on your cluster. Application Operators are managed by Operator Lifecycle Manager (OLM).
If you experience Operator issues, verify Operator subscription status. Check Operator pod health across the cluster and gather Operator logs for diagnosis.
8.5.1. Operator subscription condition types Link kopierenLink in die Zwischenablage kopiert!
Subscriptions can report the following condition types:
Condition | Description |
---|---|
| Some or all of the catalog sources to be used in resolution are unhealthy. |
| An install plan for a subscription is missing. |
| An install plan for a subscription is pending installation. |
| An install plan for a subscription has failed. |
| The dependency resolution for a subscription has failed. |
Default Red Hat OpenShift Service on AWS cluster Operators are managed by the Cluster Version Operator (CVO) and they do not have a Subscription
object. Application Operators are managed by Operator Lifecycle Manager (OLM) and they have a Subscription
object.
8.5.2. Viewing Operator subscription status by using the CLI Link kopierenLink in die Zwischenablage kopiert!
You can view Operator subscription status by using the CLI.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-admin
role. -
You have installed the OpenShift CLI (
oc
).
Procedure
List Operator subscriptions:
oc get subs -n <operator_namespace>
$ oc get subs -n <operator_namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
oc describe
command to inspect aSubscription
resource:oc describe sub <subscription_name> -n <operator_namespace>
$ oc describe sub <subscription_name> -n <operator_namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the command output, find the
Conditions
section for the status of Operator subscription condition types. In the following example, theCatalogSourcesUnhealthy
condition type has a status offalse
because all available catalog sources are healthy:Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Default Red Hat OpenShift Service on AWS cluster Operators are managed by the Cluster Version Operator (CVO) and they do not have a Subscription
object. Application Operators are managed by Operator Lifecycle Manager (OLM) and they have a Subscription
object.
8.5.3. Viewing Operator catalog source status by using the CLI Link kopierenLink in die Zwischenablage kopiert!
You can view the status of an Operator catalog source by using the CLI.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-admin
role. -
You have installed the OpenShift CLI (
oc
).
Procedure
List the catalog sources in a namespace. For example, you can check the
openshift-marketplace
namespace, which is used for cluster-wide catalog sources:oc get catalogsources -n openshift-marketplace
$ oc get catalogsources -n openshift-marketplace
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DISPLAY TYPE PUBLISHER AGE certified-operators Certified Operators grpc Red Hat 55m community-operators Community Operators grpc Red Hat 55m example-catalog Example Catalog grpc Example Org 2m25s redhat-operators Red Hat Operators grpc Red Hat 55m
NAME DISPLAY TYPE PUBLISHER AGE certified-operators Certified Operators grpc Red Hat 55m community-operators Community Operators grpc Red Hat 55m example-catalog Example Catalog grpc Example Org 2m25s redhat-operators Red Hat Operators grpc Red Hat 55m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
oc describe
command to get more details and status about a catalog source:oc describe catalogsource example-catalog -n openshift-marketplace
$ oc describe catalogsource example-catalog -n openshift-marketplace
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the preceding example output, the last observed state is
TRANSIENT_FAILURE
. This state indicates that there is a problem establishing a connection for the catalog source.List the pods in the namespace where your catalog source was created:
oc get pods -n openshift-marketplace
$ oc get pods -n openshift-marketplace
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When a catalog source is created in a namespace, a pod for the catalog source is created in that namespace. In the preceding example output, the status for the
example-catalog-bwt8z
pod isImagePullBackOff
. This status indicates that there is an issue pulling the catalog source’s index image.Use the
oc describe
command to inspect a pod for more detailed information:oc describe pod example-catalog-bwt8z -n openshift-marketplace
$ oc describe pod example-catalog-bwt8z -n openshift-marketplace
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the preceding example output, the error messages indicate that the catalog source’s index image is failing to pull successfully because of an authorization issue. For example, the index image might be stored in a registry that requires login credentials.
8.5.4. Querying Operator pod status Link kopierenLink in die Zwischenablage kopiert!
You can list Operator pods within a cluster and their status. You can also collect a detailed Operator pod summary.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-admin
role. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc
).
Procedure
List Operators running in the cluster. The output includes Operator version, availability, and up-time information:
oc get clusteroperators
$ oc get clusteroperators
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List Operator pods running in the Operator’s namespace, plus pod status, restarts, and age:
oc get pod -n <operator_namespace>
$ oc get pod -n <operator_namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Output a detailed Operator pod summary:
oc describe pod <operator_pod_name> -n <operator_namespace>
$ oc describe pod <operator_pod_name> -n <operator_namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.6. Investigating pod issues Link kopierenLink in die Zwischenablage kopiert!
Red Hat OpenShift Service on AWS leverages the Kubernetes concept of a pod, which is one or more containers deployed together on one host. A pod is the smallest compute unit that can be defined, deployed, and managed on Red Hat OpenShift Service on AWS 4.
After a pod is defined, it is assigned to run on a node until its containers exit, or until it is removed. Depending on policy and exit code, pods are either removed after exiting or retained so that their logs can be accessed.
The first thing to check when pod issues arise is the pod’s status. If an explicit pod failure has occurred, observe the pod’s error state to identify specific image, container, or pod network issues. Focus diagnostic data collection according to the error state. Review pod event messages, as well as pod and container log information. Diagnose issues dynamically by accessing running Pods on the command line, or start a debug pod with root access based on a problematic pod’s deployment configuration.
8.6.1. Understanding pod error states Link kopierenLink in die Zwischenablage kopiert!
Pod failures return explicit error states that can be observed in the status
field in the output of oc get pods
. Pod error states cover image, container, and container network related failures.
The following table provides a list of pod error states along with their descriptions.
Pod error state | Description |
---|---|
| Generic image retrieval error. |
| Image retrieval failed and is backed off. |
| The specified image name was invalid. |
| Image inspection did not succeed. |
|
|
| When attempting to retrieve an image from a registry, an HTTP error was encountered. |
| The specified container is either not present or not managed by the kubelet, within the declared pod. |
| Container initialization failed. |
| None of the pod’s containers started successfully. |
| None of the pod’s containers were killed successfully. |
| A container has terminated. The kubelet will not attempt to restart it. |
| A container or image attempted to run with root privileges. |
| Pod sandbox creation did not succeed. |
| Pod sandbox configuration was not obtained. |
| A pod sandbox did not stop successfully. |
| Network initialization failed. |
| Network termination failed. |
8.6.2. Reviewing pod status Link kopierenLink in die Zwischenablage kopiert!
You can query pod status and error states. You can also query a pod’s associated deployment configuration and review base image availability.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-admin
role. -
You have installed the OpenShift CLI (
oc
). -
skopeo
is installed.
Procedure
Switch into a project:
oc project <project_name>
$ oc project <project_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List pods running within the namespace, as well as pod status, error states, restarts, and age:
oc get pods
$ oc get pods
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Determine whether the namespace is managed by a deployment configuration:
oc status
$ oc status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the namespace is managed by a deployment configuration, the output includes the deployment configuration name and a base image reference.
Inspect the base image referenced in the preceding command’s output:
skopeo inspect docker://<image_reference>
$ skopeo inspect docker://<image_reference>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the base image reference is not correct, update the reference in the deployment configuration:
oc edit deployment/my-deployment
$ oc edit deployment/my-deployment
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When deployment configuration changes on exit, the configuration will automatically redeploy. Watch pod status as the deployment progresses, to determine whether the issue has been resolved:
oc get pods -w
$ oc get pods -w
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Review events within the namespace for diagnostic information relating to pod failures:
oc get events
$ oc get events
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.6.3. Inspecting pod and container logs Link kopierenLink in die Zwischenablage kopiert!
You can inspect pod and container logs for warnings and error messages related to explicit pod failures. Depending on policy and exit code, pod and container logs remain available after pods have been terminated.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-admin
role. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc
).
Procedure
Query logs for a specific pod:
oc logs <pod_name>
$ oc logs <pod_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Query logs for a specific container within a pod:
oc logs <pod_name> -c <container_name>
$ oc logs <pod_name> -c <container_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Logs retrieved using the preceding
oc logs
commands are composed of messages sent to stdout within pods or containers.Inspect logs contained in
/var/log/
within a pod.List log files and subdirectories contained in
/var/log
within a pod:oc exec <pod_name> -- ls -alh /var/log
$ oc exec <pod_name> -- ls -alh /var/log
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Query a specific log file contained in
/var/log
within a pod:oc exec <pod_name> cat /var/log/<path_to_log>
$ oc exec <pod_name> cat /var/log/<path_to_log>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List log files and subdirectories contained in
/var/log
within a specific container:oc exec <pod_name> -c <container_name> ls /var/log
$ oc exec <pod_name> -c <container_name> ls /var/log
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Query a specific log file contained in
/var/log
within a specific container:oc exec <pod_name> -c <container_name> cat /var/log/<path_to_log>
$ oc exec <pod_name> -c <container_name> cat /var/log/<path_to_log>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.6.4. Accessing running pods Link kopierenLink in die Zwischenablage kopiert!
You can review running pods dynamically by opening a shell inside a pod or by gaining network access through port forwarding.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-admin
role. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc
).
Procedure
Switch into the project that contains the pod you would like to access. This is necessary because the
oc rsh
command does not accept the-n
namespace option:oc project <namespace>
$ oc project <namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start a remote shell into a pod:
oc rsh <pod_name>
$ oc rsh <pod_name>
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- If a pod has multiple containers,
oc rsh
defaults to the first container unless-c <container_name>
is specified.
Start a remote shell into a specific container within a pod:
oc rsh -c <container_name> pod/<pod_name>
$ oc rsh -c <container_name> pod/<pod_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a port forwarding session to a port on a pod:
oc port-forward <pod_name> <host_port>:<pod_port>
$ oc port-forward <pod_name> <host_port>:<pod_port>
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Enter
Ctrl+C
to cancel the port forwarding session.
8.6.5. Starting debug pods with root access Link kopierenLink in die Zwischenablage kopiert!
You can start a debug pod with root access, based on a problematic pod’s deployment or deployment configuration. Pod users typically run with non-root privileges, but running troubleshooting pods with temporary root privileges can be useful during issue investigation.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-admin
role. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc
).
Procedure
Start a debug pod with root access, based on a deployment.
Obtain a project’s deployment name:
oc get deployment -n <project_name>
$ oc get deployment -n <project_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start a debug pod with root privileges, based on the deployment:
oc debug deployment/my-deployment --as-root -n <project_name>
$ oc debug deployment/my-deployment --as-root -n <project_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Start a debug pod with root access, based on a deployment configuration.
Obtain a project’s deployment configuration name:
oc get deploymentconfigs -n <project_name>
$ oc get deploymentconfigs -n <project_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start a debug pod with root privileges, based on the deployment configuration:
oc debug deploymentconfig/my-deployment-configuration --as-root -n <project_name>
$ oc debug deploymentconfig/my-deployment-configuration --as-root -n <project_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
You can append -- <command>
to the preceding oc debug
commands to run individual commands within a debug pod, instead of running an interactive shell.
8.6.6. Copying files to and from pods and containers Link kopierenLink in die Zwischenablage kopiert!
You can copy files to and from a pod to test configuration changes or gather diagnostic information.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-admin
role. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc
).
Procedure
Copy a file to a pod:
oc cp <local_path> <pod_name>:/<path> -c <container_name>
$ oc cp <local_path> <pod_name>:/<path> -c <container_name>
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The first container in a pod is selected if the
-c
option is not specified.
Copy a file from a pod:
oc cp <pod_name>:/<path> -c <container_name> <local_path>
$ oc cp <pod_name>:/<path> -c <container_name> <local_path>
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The first container in a pod is selected if the
-c
option is not specified.
NoteFor
oc cp
to function, thetar
binary must be available within the container.
8.7. Troubleshooting the Source-to-Image process Link kopierenLink in die Zwischenablage kopiert!
8.7.1. Strategies for Source-to-Image troubleshooting Link kopierenLink in die Zwischenablage kopiert!
Use Source-to-Image (S2I) to build reproducible, Docker-formatted container images. You can create ready-to-run images by injecting application source code into a container image and assembling a new image. The new image incorporates the base image (the builder) and built source.
To determine where in the S2I process a failure occurs, you can observe the state of the pods relating to each of the following S2I stages:
- During the build configuration stage, a build pod is used to create an application container image from a base image and application source code.
- During the deployment configuration stage, a deployment pod is used to deploy application pods from the application container image that was built in the build configuration stage. The deployment pod also deploys other resources such as services and routes. The deployment configuration begins after the build configuration succeeds.
-
After the deployment pod has started the application pods, application failures can occur within the running application pods. For instance, an application might not behave as expected even though the application pods are in a
Running
state. In this scenario, you can access running application pods to investigate application failures within a pod.
When troubleshooting S2I issues, follow this strategy:
- Monitor build, deployment, and application pod status
- Determine the stage of the S2I process where the problem occurred
- Review logs corresponding to the failed stage
8.7.2. Gathering Source-to-Image diagnostic data Link kopierenLink in die Zwischenablage kopiert!
The S2I tool runs a build pod and a deployment pod in sequence. The deployment pod is responsible for deploying the application pods based on the application container image created in the build stage. Watch build, deployment and application pod status to determine where in the S2I process a failure occurs. Then, focus diagnostic data collection accordingly.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. - Your API service is still functional.
-
You have installed the OpenShift CLI (
oc
).
Procedure
Watch the pod status throughout the S2I process to determine at which stage a failure occurs:
oc get pods -w
$ oc get pods -w
1 Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Use
-w
to monitor pods for changes until you quit the command usingCtrl+C
.
Review a failed pod’s logs for errors.
If the build pod fails, review the build pod’s logs:
oc logs -f pod/<application_name>-<build_number>-build
$ oc logs -f pod/<application_name>-<build_number>-build
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteAlternatively, you can review the build configuration’s logs using
oc logs -f bc/<application_name>
. The build configuration’s logs include the logs from the build pod.If the deployment pod fails, review the deployment pod’s logs:
oc logs -f pod/<application_name>-<build_number>-deploy
$ oc logs -f pod/<application_name>-<build_number>-deploy
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteAlternatively, you can review the deployment configuration’s logs using
oc logs -f dc/<application_name>
. This outputs logs from the deployment pod until the deployment pod completes successfully. The command outputs logs from the application pods if you run it after the deployment pod has completed. After a deployment pod completes, its logs can still be accessed by runningoc logs -f pod/<application_name>-<build_number>-deploy
.If an application pod fails, or if an application is not behaving as expected within a running application pod, review the application pod’s logs:
oc logs -f pod/<application_name>-<build_number>-<random_string>
$ oc logs -f pod/<application_name>-<build_number>-<random_string>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.7.3. Gathering application diagnostic data to investigate application failures Link kopierenLink in die Zwischenablage kopiert!
Application failures can occur within running application pods. In these situations, you can retrieve diagnostic information with these strategies:
- Review events relating to the application pods.
- Review the logs from the application pods, including application-specific log files that are not collected by the OpenShift Logging framework.
- Test application functionality interactively and run diagnostic tools in an application container.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. -
You have installed the OpenShift CLI (
oc
).
Procedure
List events relating to a specific application pod. The following example retrieves events for an application pod named
my-app-1-akdlg
:oc describe pod/my-app-1-akdlg
$ oc describe pod/my-app-1-akdlg
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Review logs from an application pod:
oc logs -f pod/my-app-1-akdlg
$ oc logs -f pod/my-app-1-akdlg
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Query specific logs within a running application pod. Logs that are sent to stdout are collected by the OpenShift Logging framework and are included in the output of the preceding command. The following query is only required for logs that are not sent to stdout.
If an application log can be accessed without root privileges within a pod, concatenate the log file as follows:
oc exec my-app-1-akdlg -- cat /var/log/my-application.log
$ oc exec my-app-1-akdlg -- cat /var/log/my-application.log
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If root access is required to view an application log, you can start a debug container with root privileges and then view the log file from within the container. Start the debug container from the project’s
DeploymentConfig
object. Pod users typically run with non-root privileges, but running troubleshooting pods with temporary root privileges can be useful during issue investigation:oc debug dc/my-deployment-configuration --as-root -- cat /var/log/my-application.log
$ oc debug dc/my-deployment-configuration --as-root -- cat /var/log/my-application.log
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteYou can access an interactive shell with root access within the debug pod if you run
oc debug dc/<deployment_configuration> --as-root
without appending-- <command>
.
Test application functionality interactively and run diagnostic tools, in an application container with an interactive shell.
Start an interactive shell on the application container:
oc exec -it my-app-1-akdlg /bin/bash
$ oc exec -it my-app-1-akdlg /bin/bash
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Test application functionality interactively from within the shell. For example, you can run the container’s entry point command and observe the results. Then, test changes from the command line directly, before updating the source code and rebuilding the application container through the S2I process.
Run diagnostic binaries available within the container.
NoteRoot privileges are required to run some diagnostic binaries. In these situations you can start a debug pod with root access, based on a problematic pod’s
DeploymentConfig
object, by runningoc debug dc/<deployment_configuration> --as-root
. Then, you can run diagnostic binaries as root from within the debug pod.
If diagnostic binaries are not available within a container, you can run a host’s diagnostic binaries within a container’s namespace by using
nsenter
. The following example runsip ad
within a container’s namespace, using the host`sip
binary.Enter into a debug session on the target node. This step instantiates a debug pod called
<node_name>-debug
:oc debug node/my-cluster-node
$ oc debug node/my-cluster-node
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set
/host
as the root directory within the debug shell. The debug pod mounts the host’s root file system in/host
within the pod. By changing the root directory to/host
, you can run binaries contained in the host’s executable paths:chroot /host
# chroot /host
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteRed Hat OpenShift Service on AWS 4 cluster nodes running Red Hat Enterprise Linux CoreOS (RHCOS) are immutable and rely on Operators to apply cluster changes. Accessing cluster nodes by using SSH is not recommended. However, if the Red Hat OpenShift Service on AWS API is not available, or the kubelet is not properly functioning on the target node,
oc
operations will be impacted. In such situations, it is possible to access nodes usingssh core@<node>.<cluster_name>.<base_domain>
instead.Determine the target container ID:
crictl ps
# crictl ps
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Determine the container’s process ID. In this example, the target container ID is
a7fe32346b120
:crictl inspect a7fe32346b120 --output yaml | grep 'pid:' | awk '{print $2}'
# crictl inspect a7fe32346b120 --output yaml | grep 'pid:' | awk '{print $2}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run
ip ad
within the container’s namespace, using the host’sip
binary. This example uses31150
as the container’s process ID. Thensenter
command enters the namespace of a target process and runs a command in its namespace. Because the target process in this example is a container’s process ID, theip ad
command is run in the container’s namespace from the host:nsenter -n -t 31150 -- ip ad
# nsenter -n -t 31150 -- ip ad
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteRunning a host’s diagnostic binaries within a container’s namespace is only possible if you are using a privileged container such as a debug node.
8.8. Troubleshooting storage issues Link kopierenLink in die Zwischenablage kopiert!
8.8.1. Resolving multi-attach errors Link kopierenLink in die Zwischenablage kopiert!
When a node crashes or shuts down abruptly, the attached ReadWriteOnce (RWO) volume is expected to be unmounted from the node so that it can be used by a pod scheduled on another node.
However, mounting on a new node is not possible because the failed node is unable to unmount the attached volume.
A multi-attach error is reported:
Example output
Unable to attach or mount volumes: unmounted volumes=[sso-mysql-pvol], unattached volumes=[sso-mysql-pvol default-token-x4rzc]: timed out waiting for the condition Multi-Attach error for volume "pvc-8837384d-69d7-40b2-b2e6-5df86943eef9" Volume is already used by pod(s) sso-mysql-1-ns6b4
Unable to attach or mount volumes: unmounted volumes=[sso-mysql-pvol], unattached volumes=[sso-mysql-pvol default-token-x4rzc]: timed out waiting for the condition
Multi-Attach error for volume "pvc-8837384d-69d7-40b2-b2e6-5df86943eef9" Volume is already used by pod(s) sso-mysql-1-ns6b4
Procedure
To resolve the multi-attach issue, use one of the following solutions:
Enable multiple attachments by using RWX volumes.
For most storage solutions, you can use ReadWriteMany (RWX) volumes to prevent multi-attach errors.
Recover or delete the failed node when using an RWO volume.
For storage that does not support RWX, such as VMware vSphere, RWO volumes must be used instead. However, RWO volumes cannot be mounted on multiple nodes.
If you encounter a multi-attach error message with an RWO volume, force delete the pod on a shutdown or crashed node to avoid data loss in critical workloads, such as when dynamic persistent volumes are attached.
oc delete pod <old_pod> --force=true --grace-period=0
$ oc delete pod <old_pod> --force=true --grace-period=0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command deletes the volumes stuck on shutdown or crashed nodes after six minutes.
8.9. Investigating monitoring issues Link kopierenLink in die Zwischenablage kopiert!
Red Hat OpenShift Service on AWS includes a preconfigured, preinstalled, and self-updating monitoring stack that provides monitoring for core platform components. In Red Hat OpenShift Service on AWS 4, cluster administrators can optionally enable monitoring for user-defined projects.
Use these procedures if the following issues occur:
- Your own metrics are unavailable.
- Prometheus is consuming a lot of disk space.
-
The
KubePersistentVolumeFillingUp
alert is firing for Prometheus.
8.9.2. Determining why Prometheus is consuming a lot of disk space Link kopierenLink in die Zwischenablage kopiert!
Developers can create labels to define attributes for metrics in the form of key-value pairs. The number of potential key-value pairs corresponds to the number of possible values for an attribute. An attribute that has an unlimited number of potential values is called an unbound attribute. For example, a customer_id
attribute is unbound because it has an infinite number of possible values.
Every assigned key-value pair has a unique time series. The use of many unbound attributes in labels can result in an exponential increase in the number of time series created. This can impact Prometheus performance and can consume a lot of disk space.
You can use the following measures when Prometheus consumes a lot of disk:
- Check the time series database (TSDB) status using the Prometheus HTTP API for more information about which labels are creating the most time series data. Doing so requires cluster administrator privileges.
- Check the number of scrape samples that are being collected.
Reduce the number of unique time series that are created by reducing the number of unbound attributes that are assigned to user-defined metrics.
NoteUsing attributes that are bound to a limited set of possible values reduces the number of potential key-value pair combinations.
- Enforce limits on the number of samples that can be scraped across user-defined projects. This requires cluster administrator privileges.
Prerequisites
-
You have access to the cluster as a user with the
dedicated-admin
role. -
You have installed the OpenShift CLI (
oc
).
Procedure
-
In the Red Hat OpenShift Service on AWS web console, go to Observe
Metrics. Enter a Prometheus Query Language (PromQL) query in the Expression field. The following example queries help to identify high cardinality metrics that might result in high disk space consumption:
By running the following query, you can identify the ten jobs that have the highest number of scrape samples:
topk(10, max by(namespace, job) (topk by(namespace, job) (1, scrape_samples_post_metric_relabeling)))
topk(10, max by(namespace, job) (topk by(namespace, job) (1, scrape_samples_post_metric_relabeling)))
Copy to Clipboard Copied! Toggle word wrap Toggle overflow By running the following query, you can pinpoint time series churn by identifying the ten jobs that have created the most time series data in the last hour:
topk(10, sum by(namespace, job) (sum_over_time(scrape_series_added[1h])))
topk(10, sum by(namespace, job) (sum_over_time(scrape_series_added[1h])))
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Investigate the number of unbound label values assigned to metrics with higher than expected scrape sample counts:
- If the metrics relate to a user-defined project, review the metrics key-value pairs assigned to your workload. These are implemented through Prometheus client libraries at the application level. Try to limit the number of unbound attributes referenced in your labels.
- If the metrics relate to a core Red Hat OpenShift Service on AWS project, create a Red Hat support case on the Red Hat Customer Portal.
Review the TSDB status using the Prometheus HTTP API by following these steps when logged in as a
dedicated-admin
:Get the Prometheus API route URL by running the following command:
HOST=$(oc -n openshift-monitoring get route prometheus-k8s -ojsonpath='{.status.ingress[].host}')
$ HOST=$(oc -n openshift-monitoring get route prometheus-k8s -ojsonpath='{.status.ingress[].host}')
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Extract an authentication token by running the following command:
TOKEN=$(oc whoami -t)
$ TOKEN=$(oc whoami -t)
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Query the TSDB status for Prometheus by running the following command:
curl -H "Authorization: Bearer $TOKEN" -k "https://$HOST/api/v1/status/tsdb"
$ curl -H "Authorization: Bearer $TOKEN" -k "https://$HOST/api/v1/status/tsdb"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.10. Diagnosing OpenShift CLI (oc) issues Link kopierenLink in die Zwischenablage kopiert!
8.10.1. Understanding OpenShift CLI (oc) log levels Link kopierenLink in die Zwischenablage kopiert!
With the OpenShift CLI (oc
), you can create applications and manage Red Hat OpenShift Service on AWS projects from a terminal.
If oc
command-specific issues arise, increase the oc
log level to output API request, API response, and curl
request details generated by the command. This provides a granular view of a particular oc
command’s underlying operation, which in turn might provide insight into the nature of a failure.
oc
log levels range from 1 to 10. The following table provides a list of oc
log levels, along with their descriptions.
Log level | Description |
---|---|
1 to 5 | No additional logging to stderr. |
6 | Log API requests to stderr. |
7 | Log API requests and headers to stderr. |
8 | Log API requests, headers, and body, plus API response headers and body to stderr. |
9 |
Log API requests, headers, and body, API response headers and body, plus |
10 |
Log API requests, headers, and body, API response headers and body, plus |
8.10.2. Specifying OpenShift CLI (oc) log levels Link kopierenLink in die Zwischenablage kopiert!
You can investigate OpenShift CLI (oc
) issues by increasing the command’s log level.
The Red Hat OpenShift Service on AWS user’s current session token is typically included in logged curl
requests where required. You can also obtain the current user’s session token manually, for use when testing aspects of an oc
command’s underlying process step-by-step.
Prerequisites
-
Install the OpenShift CLI (
oc
).
Procedure
Specify the
oc
log level when running anoc
command:oc <command> --loglevel <log_level>
$ oc <command> --loglevel <log_level>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
- <command>
- Specifies the command you are running.
- <log_level>
- Specifies the log level to apply to the command.
To obtain the current user’s session token, run the following command:
oc whoami -t
$ oc whoami -t
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
sha256~RCV3Qcn7H-OEfqCGVI0CvnZ6...
sha256~RCV3Qcn7H-OEfqCGVI0CvnZ6...
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.11. Troubleshooting expired tokens Link kopierenLink in die Zwischenablage kopiert!
8.11.1. Troubleshooting expired offline access tokens Link kopierenLink in die Zwischenablage kopiert!
If you use the Red Hat OpenShift Service on AWS (ROSA) CLI, rosa
, and your api.openshift.com offline access token expires, an error message appears. This happens when sso.redhat.com invalidates the token.
Example output
Can't get tokens .... Can't get access tokens ....
Can't get tokens ....
Can't get access tokens ....
Procedure
Generate a new offline access token at the following URL. A new offline access token is generated every time you visit the URL.
- Red Hat OpenShift Service on AWS (ROSA): https://console.redhat.com/openshift/token/rosa
8.12. Troubleshooting IAM roles Link kopierenLink in die Zwischenablage kopiert!
8.12.1. Resolving issues with ocm-roles and user-role IAM resources Link kopierenLink in die Zwischenablage kopiert!
You may receive an error when trying to create a cluster using the Red Hat OpenShift Service on AWS (ROSA) CLI, rosa
.
Example output
E: Failed to create cluster: The sts_user_role is not linked to account '1oNl'. Please create a user role and link it to the account.
E: Failed to create cluster: The sts_user_role is not linked to account '1oNl'. Please create a user role and link it to the account.
This error means that the user-role
IAM role is not linked to your AWS account. The most likely cause of this error is that another user in your Red Hat organization created the ocm-role
IAM role. Your user-role
IAM role needs to be created.
After any user sets up an ocm-role
IAM resource linked to a Red Hat account, any subsequent users wishing to create a cluster in that Red Hat organization must have a user-role
IAM role to provision a cluster.
Procedure
Assess the status of your
ocm-role
anduser-role
IAM roles with the following commands:rosa list ocm-role
$ rosa list ocm-role
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
I: Fetching ocm roles ROLE NAME ROLE ARN LINKED ADMIN ManagedOpenShift-OCM-Role-1158 arn:aws:iam::2066:role/ManagedOpenShift-OCM-Role-1158 No No
I: Fetching ocm roles ROLE NAME ROLE ARN LINKED ADMIN ManagedOpenShift-OCM-Role-1158 arn:aws:iam::2066:role/ManagedOpenShift-OCM-Role-1158 No No
Copy to Clipboard Copied! Toggle word wrap Toggle overflow rosa list user-role
$ rosa list user-role
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
I: Fetching user roles ROLE NAME ROLE ARN LINKED ManagedOpenShift-User.osdocs-Role arn:aws:iam::2066:role/ManagedOpenShift-User.osdocs-Role Yes
I: Fetching user roles ROLE NAME ROLE ARN LINKED ManagedOpenShift-User.osdocs-Role arn:aws:iam::2066:role/ManagedOpenShift-User.osdocs-Role Yes
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
With the results of these commands, you can create and link the missing IAM resources.
8.12.1.1. Creating an ocm-role IAM role Link kopierenLink in die Zwischenablage kopiert!
You create your ocm-role
IAM roles by using the command-line interface (CLI).
Prerequisites
- You have an AWS account.
- You have Red Hat Organization Administrator privileges in the OpenShift Cluster Manager organization.
- You have the permissions required to install AWS account-wide roles.
-
You have installed and configured the latest ROSA CLI,
rosa
, on your installation host.
Procedure
To create an ocm-role IAM role with basic privileges, run the following command:
rosa create ocm-role
$ rosa create ocm-role
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To create an ocm-role IAM role with admin privileges, run the following command:
rosa create ocm-role --admin
$ rosa create ocm-role --admin
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command allows you to create the role by specifying specific attributes. The following example output shows the "auto mode" selected, which lets the ROSA CLI (
rosa
) create your Operator roles and policies. See "Methods of account-wide role creation" for more information.
Example output
- 1
- A prefix value for all of the created AWS resources. In this example,
ManagedOpenShift
prepends all of the AWS resources. - 2
- Choose if you want this role to have the additional admin permissions.Note
You do not see this prompt if you used the
--admin
option. - 3
- The Amazon Resource Name (ARN) of the policy to set permission boundaries.
- 4
- Specify an IAM path for the user name.
- 5
- Choose the method to create your AWS roles. Using
auto
, the ROSA CLI generates and links the roles and policies. In theauto
mode, you receive some different prompts to create the AWS roles. - 6
- The
auto
method asks if you want to create a specificocm-role
using your prefix. - 7
- Confirm that you want to associate your IAM role with your OpenShift Cluster Manager.
- 8
- Links the created role with your AWS organization.
8.12.1.2. Creating a user-role IAM role Link kopierenLink in die Zwischenablage kopiert!
You can create your user-role
IAM roles by using the command-line interface (CLI).
Prerequisites
- You have an AWS account.
-
You have installed and configured the latest ROSA CLI,
rosa
, on your installation host.
Procedure
To create a
user-role
IAM role with basic privileges, run the following command:rosa create user-role
$ rosa create user-role
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This command allows you to create the role by specifying specific attributes. The following example output shows the "auto mode" selected, which lets the ROSA CLI (
rosa
) to create your Operator roles and policies. See "Understanding the auto and manual deployment modes" for more information.
Example output
- 1
- A prefix value for all of the created AWS resources. In this example,
ManagedOpenShift
prepends all of the AWS resources. - 2
- The Amazon Resource Name (ARN) of the policy to set permission boundaries.
- 3
- Specify an IAM path for the user name.
- 4
- Choose the method to create your AWS roles. Using
auto
, the ROSA CLI generates and links the roles and policies. In theauto
mode, you receive some different prompts to create the AWS roles. - 5
- The
auto
method asks if you want to create a specificuser-role
using your prefix. - 6
- Links the created role with your AWS organization.
8.12.1.3. Associating your AWS account with IAM roles Link kopierenLink in die Zwischenablage kopiert!
You can associate or link your AWS account with existing IAM roles by using the ROSA CLI, rosa
.
Prerequisites
- You have an AWS account.
- You have the permissions required to install AWS account-wide roles. See the "Additional resources" of this section for more information.
-
You have installed and configured the latest AWS (
aws
) and ROSA (rosa
) CLIs on your installation host. You have created the
ocm-role
anduser-role
IAM roles, but have not yet linked them to your AWS account. You can check whether your IAM roles are already linked by running the following commands:rosa list ocm-role
$ rosa list ocm-role
Copy to Clipboard Copied! Toggle word wrap Toggle overflow rosa list user-role
$ rosa list user-role
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If
Yes
is displayed in theLinked
column for both roles, you have already linked the roles to an AWS account.
Procedure
In the ROSA CLI, link your
ocm-role
resource to your Red Hat organization by using your Amazon Resource Name (ARN):NoteYou must have Red Hat Organization Administrator privileges to run the
rosa link
command. After you link theocm-role
resource with your AWS account, it takes effect and is visible to all users in the organization.rosa link ocm-role --role-arn <arn>
$ rosa link ocm-role --role-arn <arn>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
I: Linking OCM role ? Link the '<AWS ACCOUNT ID>` role with organization '<ORG ID>'? Yes I: Successfully linked role-arn '<AWS ACCOUNT ID>' with organization account '<ORG ID>'
I: Linking OCM role ? Link the '<AWS ACCOUNT ID>` role with organization '<ORG ID>'? Yes I: Successfully linked role-arn '<AWS ACCOUNT ID>' with organization account '<ORG ID>'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the ROSA CLI, link your
user-role
resource to your Red Hat user account by using your Amazon Resource Name (ARN):rosa link user-role --role-arn <arn>
$ rosa link user-role --role-arn <arn>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
I: Linking User role ? Link the 'arn:aws:iam::<ARN>:role/ManagedOpenShift-User-Role-125' role with organization '<AWS ID>'? Yes I: Successfully linked role-arn 'arn:aws:iam::<ARN>:role/ManagedOpenShift-User-Role-125' with organization account '<AWS ID>'
I: Linking User role ? Link the 'arn:aws:iam::<ARN>:role/ManagedOpenShift-User-Role-125' role with organization '<AWS ID>'? Yes I: Successfully linked role-arn 'arn:aws:iam::<ARN>:role/ManagedOpenShift-User-Role-125' with organization account '<AWS ID>'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.12.1.4. Associating multiple AWS accounts with your Red Hat organization Link kopierenLink in die Zwischenablage kopiert!
You can associate multiple AWS accounts with your Red Hat organization. Associating multiple accounts lets you create Red Hat OpenShift Service on AWS clusters on any of the associated AWS accounts from your Red Hat organization.
With this capability, you can create clusters on different AWS profiles according to characteristics that make sense for your business, for example, by using one AWS profile for each region to create region-bound environments.
Prerequisites
- You have an AWS account.
- You are using OpenShift Cluster Manager to create clusters.
- You have the permissions required to install AWS account-wide roles.
-
You have installed and configured the latest AWS (
aws
) and ROSA (rosa
) CLIs on your installation host. -
You have created the
ocm-role
anduser-role
IAM roles for Red Hat OpenShift Service on AWS.
Procedure
To associate an additional AWS account, first create a profile in your local AWS configuration. Then, associate the account with your Red Hat organization by creating the ocm-role
, user, and account roles in the additional AWS account.
To create the roles in an additional region, specify the --profile <aws-profile>
parameter when running the rosa create
commands and replace <aws_profile>
with the additional account profile name:
To specify an AWS account profile when creating an OpenShift Cluster Manager role:
rosa create --profile <aws_profile> ocm-role
$ rosa create --profile <aws_profile> ocm-role
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To specify an AWS account profile when creating a user role:
rosa create --profile <aws_profile> user-role
$ rosa create --profile <aws_profile> user-role
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To specify an AWS account profile when creating the account roles:
rosa create --profile <aws_profile> account-roles
$ rosa create --profile <aws_profile> account-roles
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
If you do not specify a profile, the default AWS profile and its associated AWS region are used.
8.13. Troubleshooting Red Hat OpenShift Service on AWS cluster deployments Link kopierenLink in die Zwischenablage kopiert!
This document describes how to troubleshoot cluster deployment errors.
8.13.1. Obtaining information about a failed cluster Link kopierenLink in die Zwischenablage kopiert!
If a cluster deployment fails, the cluster is put into an "error" state.
Procedure
Run the following command to get more information:
rosa describe cluster -c <my_cluster_name> --debug
$ rosa describe cluster -c <my_cluster_name> --debug
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.13.2. Troubleshooting cluster creation with an osdCcsAdmin error Link kopierenLink in die Zwischenablage kopiert!
If a cluster creation action fails, you might receive the following error message.
Example output
Failed to create cluster: Unable to create cluster spec: Failed to get access keys for user 'osdCcsAdmin': NoSuchEntity: The user with name osdCcsAdmin cannot be found.
Failed to create cluster: Unable to create cluster spec: Failed to get access keys for user 'osdCcsAdmin': NoSuchEntity: The user with name osdCcsAdmin cannot be found.
Procedure
To fix this issue:
Delete the stack:
rosa init --delete
$ rosa init --delete
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Reinitialize your account:
rosa init
$ rosa init
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.13.3. Creating the Elastic Load Balancing (ELB) service-linked role Link kopierenLink in die Zwischenablage kopiert!
If you have not created a load balancer in your AWS account, it is possible that the service-linked role for Elastic Load Balancing (ELB) might not exist yet. You may receive the following error:
Error: Error creating network Load Balancer: AccessDenied: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/ManagedOpenShift-Installer-Role/xxxxxxxxxxxxxxxxxxx is not authorized to perform: iam:CreateServiceLinkedRole on resource: arn:aws:iam::xxxxxxxxxxxx:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing"
Error: Error creating network Load Balancer: AccessDenied: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/ManagedOpenShift-Installer-Role/xxxxxxxxxxxxxxxxxxx is not authorized to perform: iam:CreateServiceLinkedRole on resource: arn:aws:iam::xxxxxxxxxxxx:role/aws-service-role/elasticloadbalancing.amazonaws.com/AWSServiceRoleForElasticLoadBalancing"
Procedure
To resolve this issue, ensure that the role exists on your AWS account. If not, create this role with the following command:
aws iam get-role --role-name "AWSServiceRoleForElasticLoadBalancing" || aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com"
aws iam get-role --role-name "AWSServiceRoleForElasticLoadBalancing" || aws iam create-service-linked-role --aws-service-name "elasticloadbalancing.amazonaws.com"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThis command only needs to be executed once per account.
8.13.4. Repairing a cluster that cannot be deleted Link kopierenLink in die Zwischenablage kopiert!
In specific cases, the following error appears in OpenShift Cluster Manager if you attempt to delete your cluster.
Error deleting cluster CLUSTERS-MGMT-400: Failed to delete cluster <hash>: sts_user_role is not linked to your account. sts_ocm_role is linked to your organization <org number> which requires sts_user_role to be linked to your Red Hat account <account ID>.Please create a user role and link it to the account: User Account <account ID> is not authorized to perform STS cluster operations Operation ID: b0572d6e-fe54-499b-8c97-46bf6890011c
Error deleting cluster
CLUSTERS-MGMT-400: Failed to delete cluster <hash>: sts_user_role is not linked to your account. sts_ocm_role is linked to your organization <org number> which requires sts_user_role to be linked to your Red Hat account <account ID>.Please create a user role and link it to the account: User Account <account ID> is not authorized to perform STS cluster operations
Operation ID: b0572d6e-fe54-499b-8c97-46bf6890011c
If you try to delete your cluster from the CLI, the following error appears.
E: Failed to delete cluster <hash>: sts_user_role is not linked to your account. sts_ocm_role is linked to your organization <org_number> which requires sts_user_role to be linked to your Red Hat account <account_id>.Please create a user role and link it to the account: User Account <account ID> is not authorized to perform STS cluster operations
E: Failed to delete cluster <hash>: sts_user_role is not linked to your account. sts_ocm_role is linked to your organization <org_number> which requires sts_user_role to be linked to your Red Hat account <account_id>.Please create a user role and link it to the account: User Account <account ID> is not authorized to perform STS cluster operations
This error occurs when the user-role
is unlinked or deleted.
Procedure
Run the following command to create the
user-role
IAM resource:rosa create user-role
$ rosa create user-role
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After you see that the role has been created, you can delete the cluster. The following confirms that the role was created and linked:
I: Successfully linked role ARN <user role ARN> with account <account ID>
I: Successfully linked role ARN <user role ARN> with account <account ID>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow