Chapter 6. Troubleshooting DSPA component errors
This table displays common errors found in DataSciencePipelinesApplication (DSPA) components, along with the associated status, message, and proposed solution. The Ready condition type accumulates errors from various DSPA components, providing a status view of the DSPA deployment.
Type | Status | Error message and solution |
---|---|---|
Ready | False False |
Error message: Solution: This issue occurs in clusters that use self-signed certificates with OpenShift AI version 2.9 or later. The data science pipelines manager cannot connect to the object storage because it does not trust the object storage SSL certificate. Therefore, the pipeline server cannot be created. Contact your IT operations administrator to add the relevant Certificate Authority bundle. For more information, see Working with certificates. |
Ready | False False |
Error message: Solution: In clusters running OpenShift AI 2.8.x, the data science pipelines manager might fail to connect to the object storage, and the pipeline server might not be created. Ensure that your object store credentials and connection information are accurate, and verify that the object store is accessible from within the data science project’s associated OpenShift namespace. One common issue is that the object storage SSL certificate is not trusted, particularly if self-signed certificates are used. Verify and update your object storage credentials, then retry the operation. |
Ready | False False |
Error message: Solution: Provide the correct credentials for your object storage and retry the operation. |
Ready | False False |
Error message: Solution: If the issue persists beyond startup, check for network issues or misconfigurations in the database connection settings. |
Ready | False False |
Error message: Solution: This issue can occur when you use any external database, such as Amazon RDS. The data science pipelines manager cannot connect to the database because it does not trust the database SSL certificate, preventing the creation of the pipeline server. Contact your IT operations administrator to add the relevant certificates. For more information, see Working with certificates. |
Ready | False False |
Error message: Solution: This issue might occur when using an external database, such as Amazon RDS. Initially, the pipeline server is created successfully. However, after some time, the OpenShift AI dashboard displays an "Error displaying pipelines" message, and the DSPA conditions indicate that the host is blocked due to multiple connection errors. For more information on how resolve this issue for an external Amazon RDS database, see Resolving "Host is blocked because of many connection errors" error in Amazon RDS for MySQL. Note: Clicking this link opens an external website. |
Ready | False False |
Error message: Solution: Ensure that the project name in OpenShift is less than 40 characters. |
Ready | False False |
Error message: Solution: If the failure persists for more than 25 seconds during DSPA startup, recreate the missing service account. |
Ready | False False |
Error message: Solution: If the failure persists for more than 25 seconds during DSPA startup, recreate the missing service account. |
Ready | False False |
Error message: Solution: If the failure persists for more than 25 seconds during DSPA startup, recreate the missing service account. |
Ready | False False |
Error message: Solution: Wait for DSPA startup to complete. If deployment fails after 25 seconds, check the logs for further information. |
6.1. Common errors across DSP components
The following table lists errors that might occur across multiple DSPA components:
Deployment condition and condition type | Status | Error message and solution |
---|---|---|
Condition: Component Deployment Not Found
Condition type: | False |
Error message: Solution: The deployment for the component does not exist. Typically, this issue occurs due to missing deployments or issues that occurred during creation. |
Condition: Deployment Scaled Down
Condition type: | False |
Error message: Solution: The component is unavailable as the deployment replica count is set to zero. |
Condition: Component Failing to Progress
Condition type: | False |
Error message:
Solution: The deployment has stalled due to |
Condition: Replica Creation Failure
Condition type: | False |
Error message: Solution: Replica creation has failed, typically due to an error in the replica set or with the service accounts. |
Condition: Pod-Level Failures
Condition type: | False |
Error message: Solution: Deployment pods are in a failed state. Check the pod logs for further information. |
Condition: Pod in CrashLoopBackOff
Condition type: | False |
Error message: Solution: Pod containers are failing repeatedly, often due to incorrect environment variables or missing service accounts. |
Condition: Component Deploying (No Errors)
Condition: type: | False |
Error message: Solution: The component deployment process is ongoing with no errors detected. |
Condition: Component Minimally Available
Condition type: | True |
Error message: Solution: The component is available, but with only the minimum number of replicas running. |