Search

Chapter 4. Known issues

download PDF

This section describes known issues in Red Hat OpenShift Data Science and any known methods of working around these issues.

RHODS-11791 - Usage data collection is enabled after upgrade

If you previously had the Allow collection of usage data option deselected (that is, disabled), this option becomes selected (that is, enabled) when you upgrade OpenShift Data Science.

Workaround

Manually reset the Allow collection of usage data option. To do this, perform the following actions:

  1. In the OpenShift Data Science dashboard, in the left menu, click Settings Cluster settings.

    The Cluster Settings page opens.

  2. In the Usage data collection section, deselect Allow collection of usage data.
  3. Click Save changes.

DATA-SCIENCE-PIPELINES-OPERATOR-294 - Scheduled pipeline run that uses data-passing might fail to pass data between steps, or fail the step entirely

A scheduled pipeline run that uses an S3 object store to store the pipeline artifacts might fail with an error such as the following:

Bad value for --endpoint-url "cp": scheme is missing. Must be of the form http://<hostname>/ or https://<hostname>/

This issue occurs because the S3 object store endpoint is not successfully passed to the pods for the scheduled pipeline run.

Workaround

Depending on the size of the pipeline artifacts being passed, you can either partially or completely work around this issue by applying a custom artifact-passing script and then restarting the pipeline server. Specifically, this workaround results in the following behavior:

  • For pipeline artifacts smaller than 3 kilobytes, the pipeline run now successfully passes the artifacts into your S3 object store.
  • For pipeline artifacts larger than 3 kilobytes, the pipeline run still does not pass the artifacts into your S3 object store. However, the workaround ensures that the run continues to completion. Any smaller artifacts in the remainder of the pipeline run are successfully stored.

To apply this workaround, perform the following actions:

  1. In a text editor, paste the following YAML-based artifact-passing script. The script defines a ConfigMap object.

    apiVersion: v1
    data:
      artifact_script: |-
        #!/usr/bin/env sh
        push_artifact() {
            workspace_dir=$(echo $(context.taskRun.name) | sed -e "s/$(context.pipeline.name)-//g")
            workspace_dest=/workspace/${workspace_dir}/artifacts/$(context.pipelineRun.name)/$(context.taskRun.name)
            artifact_name=$(basename $2)
            if [ -f "$workspace_dest/$artifact_name" ]; then
                echo sending to: ${workspace_dest}/${artifact_name}
                tar -cvzf $1.tgz -C ${workspace_dest} ${artifact_name}
                aws s3 --endpoint <Endpoint> cp $1.tgz s3://<Bucket>/artifacts/$PIPELINERUN/$PIPELINETASK/$1.tgz
            elif [ -f "$2" ]; then
                tar -cvzf $1.tgz -C $(dirname $2) ${artifact_name}
                aws s3 --endpoint <Endpoint> cp $1.tgz s3://<Bucket>/artifacts/$PIPELINERUN/$PIPELINETASK/$1.tgz
            else
                echo "$2 file does not exist. Skip artifact tracking for $1"
            fi
        }
        push_log() {
            cat /var/log/containers/$PODNAME*$NAMESPACE*step-main*.log > step-main.log
            push_artifact main-log step-main.log
        }
        strip_eof() {
            if [ -f "$2" ]; then
                awk 'NF' $2 | head -c -1 > $1_temp_save && cp $1_temp_save $2
            fi
        }
    kind: ConfigMap
    metadata:
      name: custom-script
  2. In the script, replace any occurrences of <Endpoint> with your S3 endpoint (for example, https://s3.amazonaws.com), and occurrences of <Bucket> with your S3 bucket name.
  3. Save the YAML file for the ConfigMap object.
  4. Apply the YAML file.

    $ oc apply -f <configmap_file_name>.yaml
  5. Restart the pipeline server.

    $ oc project <data_science_project_name>
    $ oc delete pod $(oc get pods  -l app=ds-pipeline-pipelines-definition --no-headers | awk {print $1})

ODH-DASHBOARD-1741 - Cannot create a workbench whose name begins with a number

If you try to create a workbench whose name begins with a number, the workbench does not start.

Workaround
Delete the workbench and create a new one with a name that begins with a letter.

ODH-DASHBOARD-1699 - Workbench does not restart after configuration update

If you update a workbench configuration (for example, by changing an environment variable value), the workbench does not automatically restart and you do not see the changes.

Workaround
Manually restart the workbench.

KUBEFLOW-157 - Logging out of JupyterLab does not work if you are already logged out of the OpenShift Data Science dashboard

If you log out of the OpenShift Data Science dashboard before you log out of JupyterLab, then logging out of JupyterLab is not successful. For example, if you know the URL for a Jupyter notebook, you are able to open this again in your browser.

Workaround
Log out of JupyterLab before you log out of the OpenShift Data Science dashboard.

RHODS-9789 - Pipeline servers fail to start if they contain a custom database that includes a dash in its database name or username field

When you create a pipeline server that uses a custom database, if the value that you set for the dbname field or username field includes a dash, the pipeline server fails to start.

Workaround
Edit the pipeline server to omit the dash from the affected fields.

RHODS-9764 - Data connection details get reset when editing a workbench

When you edit a workbench that has an existing data connection and then select the Create new data connection option, the edit page might revert to the Use existing data connection option before you have finished specifying the new connection details.

Workaround

To work around this issue, perform the following actions:

  1. Select the Create new data connection option again.
  2. Specify the new connection details and click Update workbench before the page reverts to the Use existing data connection option.

RHODS-9583 - Data Science dashboard does not detect an existing OpenShift Pipelines installation

When the OpenShift Pipelines operator is installed as a global operator on your cluster, the Data Science dashboard does not properly detect it.

An alert icon appears next to the Data Science Pipelines option in the left navigation bar. When you open Data Science Pipelines, you see the message: “To use pipelines, first install the Red Hat OpenShift Pipelines Operator.” However, when you view the list of installed operators in the openshift-operators project, you see that OpenShift Pipelines is installed as a global operator on your cluster.

Workaround

Follow these steps as a user with cluster-admin permissions:

  1. Log in to your cluster using the oc client.
  2. Enter the following command to update OdhDashboardConfig in the redhat-ods-applications application namespace:

    $ oc patch OdhDashboardConfig odh-dashboard-config -n redhat-ods-applications --type=merge -p '{"spec": {"dashboardConfig": {"disablePipelines": false}}}'

RHODS-9412 - Elyra pipeline fails to run if workbench is created by a user with edit permissions

If a user who has been granted edit permissions for a project creates a project workbench, that user sees the following behavior:

  • During the workbench creation process, the user sees an Error creating workbench message related to the creation of Kubernetes role bindings.
  • Despite the preceding error message, OpenShift Data Science still creates the workbench. However, the error message means that the user will not be able to use the workbench to run Elyra data science pipelines.
  • If the user tries to use the workbench to run an Elyra pipeline, Jupyter shows an Error making request message that describes failed initialization.

    Workaround
    A user with administrator permissions (for example, the project owner) must create the workbench on behalf of the user with edit permissions. That user can then use the workbench to run Elyra pipelines.

RHODS-9030 - Uninstall process for OpenShift Data Science might become stuck when removing kfdefs resources

The steps for uninstalling the OpenShift Data Science managed service are described in Uninstalling OpenShift Data Science.

However, even when you follow this guide, you might see that the uninstall process does not finish successfully. Instead, the process stays on the step of deleting kfdefs resources that are used by the Kubeflow Operator. As shown in the following example, kfdefs resources might exist in the redhat-ods-applications, redhat-ods-monitoring, and rhods-notebooks namespaces:

$ oc get kfdefs.kfdef.apps.kubeflow.org -A

NAMESPACE                  NAME                                   AGE
redhat-ods-applications    rhods-anaconda                         3h6m
redhat-ods-applications    rhods-dashboard                        3h6m
redhat-ods-applications    rhods-data-science-pipelines-operator  3h6m
redhat-ods-applications    rhods-model-mesh                       3h6m
redhat-ods-applications    rhods-nbc                              3h6m
redhat-ods-applications    rhods-osd-config                       3h6m
redhat-ods-monitoring      modelmesh-monitoring                   3h6m
redhat-ods-monitoring      monitoring                             3h6m
rhods-notebooks            rhods-notebooks                        3h6m
rhods-notebooks            rhods-osd-config                       3h5m

Failed removal of the kfdefs resources might also prevent later installation of a newer version of OpenShift Data Science.

Workaround
To manually delete the kfdefs resources so that you can complete the uninstall process, see the "Force individual object removal when it has finalizers" section of the following Red Hat solution article: https://access.redhat.com/solutions/4165791.

RHODS-8939 - For a Jupyter notebook created in a previous release, default shared memory might cause a runtime error

For a Jupyter notebook created in a release earlier than the current release, the default shared memory for a Jupyter notebook is set to 64 Mb and you cannot change this default value in the notebook configuration.

For example, PyTorch relies on shared memory and the default size of 64 Mb is not enough for large use cases, such as when training a model or when performing heavy data manipulations. Jupyter reports a “no space left on device" message and /dev/smh is full.

Starting with release 1.31, this issue is fixed and any new notebook’s shared memory is set to the size of the node.

Workaround

For a Jupyter notebook created in a release earlier than 1.31, either recreate the Jupyter notebook or follow these steps:

  1. In your data science project, create a workbench as described in Creating a project workbench.
  2. In the data science project page, in the Workbenches section, click the Status toggle for the workbench to change it from Running to Stopped.
  3. Open your OpenShift Console and then select Administrator.
  4. Select Home API Explorer.
  5. In the Filter by kind field, type notebook.
  6. Select the kubeflow v1 notebook.
  7. Select the Instances tab and then select the instance for the workbench that you created in Step 1.
  8. Click the YAML tab and then select Actions Edit Notebook.
  9. Edit the YAML file to add the following information to the configuration:

    • For the container that has the name of your Workbench notebook, add the following lines to the volumeMounts section:

      - mountPath: /dev/shm
        name: shm

      For example, if your workbench name is myworkbench, update the YAML file as follows:

      spec:
          containers:
            - env
              ...
              name: myworkbench
              ...
               volumeMounts:
               - mountPath: /dev/shm
                 name: shm
    • In the volumes section, add the lines shown in the following example:

           volumes:
             name: shm
             emptyDir:
               medium: Memory

      Note: Optionally, you can specify a limit to the amount of memory to use for the emptyDir.

  10. Click Save.
  11. In the data science dashboard, in the Workbenches section of the data science project, click the Status toggle for the workbench. The status changes from Stopped to Starting and then Running.
  12. Restart the notebook.
Warning

If you later edit the notebook’s configuration through the Data Science dashboard UI, your workaround edit to the notebook configuration will be erased.

RHODS-8921 - You cannot create a pipeline server when cumulative character limit is exceeded

When the cumulative character limit of a data science project name and a pipeline server name exceeds 62 characters, you are unable to successfully create a pipeline server.

Workaround
Rename your data science project so that it does not exceed 30 characters.

RHODS-8865 - A pipeline server fails to start unless you specify an Amazon Web Services (AWS) Simple Storage Service (S3) bucket resource

When you create a data connection for a data science project, the AWS_S3_BUCKET field is not designated as a mandatory field. However, if you do not specify a value for this field, and you attempt to configure a pipeline server, the pipeline server fails to start successfully.

RHODS-7718 - User without dashboard permissions is able to continue using their running notebooks and workbenches indefinitely

When a Red Hat OpenShift Data Science administrator revokes a user’s permissions, the user can continue to use their running notebooks and workbenches indefinitely.

Workaround
When the OpenShift Data Science administrator revokes a user’s permissions, the administrator should also stop any running notebooks and workbenches for that user.

RHODS-6907 - Attempting to increase the size of a Persistent Volume (PV) fails when it is not connected to a workbench

Attempting to increase the size of a Persistent Volume (PV) that is not connected to a workbench fails. When changing a data science project’s storage, users can still edit the size of the PV in the user interface, but this action does not have any effect.

RHODS-6950 - Unable to scale down a workbench’s GPUs when all GPUs in the cluster are being used

It is not possible to scale down a workbench’s GPUs if all GPUs in the cluster are being used. This issue applies to GPUs being used by one workbench, and GPUs being used by multiple workbenches.

Workaround

To workaround around this issue, perform the following steps:

  1. Stop all active workbenches that are using GPUs.
  2. Wait until the relevant GPUs are available again.
  3. Edit the workbench and scale down the GPU instances.

RHODS-6539 - Anaconda Professional Edition cannot be validated and enabled in OpenShift Data Science

Anaconda Professional Edition cannot be enabled as the dashboard’s key validation for Anaconda Professional Edition is inoperable.

RHODS-6346 - Unclear error message displays when using invalid characters to create a data science project

When creating a data science project’s data connection, workbench, or storage connection using invalid special characters, the following error message is displayed:

the object provided is unrecognized (must be of type Secret): couldn't get version/kind; json parse error: unexpected end of JSON input ({"apiVersion":"v1","kind":"Sec ...)

The error message fails to clearly indicate the problem.

RHODS-6955 - An error can occur when trying to edit a workbench

When editing a workbench, an error similar to the following can occur:

Error creating workbench
Operation cannot be fulfilled on notebooks.kubeflow.org "workbench-name": the object has been modified; please apply your changes to the latest version and try again

RHODS-6913 - When editing the configuration settings of a workbench, a misleading error message appears

When you edit the configuration settings of a workbench, a warning message appears stating the workbench will restart if you make any changes to its configuration settings. This warning is misleading, as if you change the values of its environment variables, the workbench does not automatically restart.

RHODS-6383 - An ImagePullBackOff error message is not displayed when required during the workbench creation process

Pods can experience issues pulling container images from the container registry. If an error occurs, the relevant pod enters into an ImagePullBackOff state. During the workbench creation process, if an ImagePullBackOff error occurs, an appropriate message is not displayed.

Workaround
Check the event log for further information on the ImagePullBackOff error. To do this, click on the workbench status when it is starting.

RHODS-6373 - Workbenches fail to start when cumulative character limit is exceeded

When the cumulative character limit of a data science project’s title and workbench title exceeds 62 characters, workbenches fail to start.

RHODS-6356 - The notebook creation process fails for users who have never logged in to the dashboard

The dashboard’s notebook Administration page displays users belonging to the user group and admin group in OpenShift. However, if an administrator attempts to start a notebook server on behalf of a user who has never logged in to the dashboard, the server creation process fails and displays the following error message:

Request invalid against a username that does not exist.
Workaround
Request that the relevant user logs into the dashboard.

RHODS-6216 - The ModelMesh oauth-proxy container is intermittently unstable

ModelMesh pods do not deploy correctly due to a failure of the ModelMesh oauth-proxy container. This issue occurs intermittently and only if authentication is enabled in the ModelMesh runtime environment. It is more likely to occur when additional ModelMesh instances are deployed in different namespaces.

RHODS-5906 - The NVIDIA GPU Operator is incompatible with OpenShift 4.11.12

Provisioning a GPU node on a OpenShift 4.11.12 cluster results in the nvidia-driver-daemonset pod getting stuck in a CrashLoopBackOff state. The NVIDIA GPU Operator is compatible with OpenShift 4.11.9 and 4.11.13.

RHODS-5763 - Incorrect package version displayed during notebook selection

The Start a notebook server page displays an incorrect version number for the Anaconda notebook image.

RHODS-5543 - When using the NVIDIA GPU Operator, more nodes than needed are created by the Node Autoscaler

When a pod cannot be scheduled due to insufficient available resources, the Node Autoscaler creates a new node. There is a delay until the newly created node receives the relevant GPU workload. Consequently, the pod cannot be scheduled and the Node Autoscaler’s continuously creates additional new nodes until one of the nodes is ready to receive the GPU workload. For more information about this issue, see When using the NVIDIA GPU Operator, more nodes than needed are created by the Node Autoscaler.

Workaround
Apply the cluster-api/accelerator label in machineset.spec.template.spec.metadata. This causes the autoscaler to consider those nodes as unready until the GPU driver has been deployed.

RHODS-5251 - Notebook server administration page shows users who have lost permission access

If a user who previously started a notebook server in Jupyter loses their permissions to do so (for example, if an OpenShift Data Science administrator changes the user’s group settings or removes the user from a permitted group), administrators continue to see the user’s notebook servers on the server Administration page. As a consequence, an administrator is able to restart notebook servers that belong to the user who’s permissions were revoked.

RHODS-4769 - GPUs on nodes with unsupported taints cannot be allocated to notebook servers

GPUs on nodes marked with any taint other than the supported nvidia.com/gpu taint cannot be selected when creating a notebook server. To avoid this issue, use only the nvidia.com/gpu taint on GPU nodes used with OpenShift Data Science.

RHODS-4799 - Tensorboard requires manual steps to view

When a user has TensorFlow or PyTorch notebook images and wants to use TensorBoard to display data, manual steps are necessary to include environment variables in the notebook environment, and to import those variables for use in your code.

Workaround
When you start your notebook server, use the following code to set the value for the TENSORBOARD_PROXY_URL environment variable to use your OpenShift Data Science user ID.
import os
os.environ["TENSORBOARD_PROXY_URL"]= os.environ["NB_PREFIX"]+"/proxy/6006/"

RHODS-4718 - The Intel® oneAPI AI Analytics Toolkits quick start references nonexistent sample notebooks

The Intel® oneAPI AI Analytics Toolkits quick start, located on the Resources page on the dashboard, requires the user to load sample notebooks as part of the instruction steps, but refers to notebooks that do not exist in the associated repository.

RHODS-4627 - The CronJob responsible for validating Anaconda Professional Edition’s license is suspended and does not run daily

The CronJob responsible for validating Anaconda Professional Edition’s license is automatically suspended by the OpenShift Data Science operator. As a result, the CronJob does not run daily as scheduled. In addition, when Anaconda Professional Edition’s license expires, Anaconda Professional Edition is not indicated as disabled on the OpenShift Data Science dashboard.

RHODS-4502 - The NVIDIA GPU Operator card on the dashboard displays button unnecessarily

GPUs are automatically available in Jupyter after the NVIDIA GPU Operator is installed. The Enable button, located on the NVIDIA GPU Operator card on the Explore page, is therefore redundant. In addition, clicking the Enable button moves the NVIDIA GPU Operator card to the Enabled page, even if the Operator is not installed.

RHODS-3985 - Dashboard does not display *Enabled page content after ISV operator uninstall

After an ISV operator is uninstalled, no content is displayed on the Enabled page on the dashboard. Instead, the following error is displayed:

Error loading components
HTTP request failed
Workaround
Wait 30-40 seconds and then refresh the page in your browser.

RHODS-3984 - Incorrect package versions displayed during notebook selection

In the OpenShift Data Science interface, the Start a notebook server page displays incorrect version numbers for the JupyterLab and Notebook packages included in the oneAPI AI Analytics Toolkit notebook image. The page might also show an incorrect value for the Python version used by this image.

Workaround
When you start your oneAPI AI Analytics Toolkit notebook server, you can check which Python packages are installed on your notebook server and which version of the package you have by running the !pip list command in a notebook cell.

RHODS-2956 - Error can occur when creating a notebook instance

When creating a notebook instance in Jupyter, a Directory not found error appears intermittently. This error message can be ignored by clicking Dismiss.

RHODS-2881 - Actions on dashboard not clearly visible

The dashboard actions to re-validate a disabled application’s license, and to remove a disabled application’s card are not clearly visible to the user. These actions only appear when the user clicks on the application card’s Disabled label. As a result, the intended workflows may not be clear to the user.

RHODS-2879 - License re-validation action appears unnecessarily

The dashboard action to re-validate a disabled application’s license appears unnecessarily for applications that do not have a license validation or activation system. In addition, when a user attempts to re-validate a license that cannot be re-validated, feedback is not displayed to state why the action cannot be completed.

RHODS-2650 - Error can occur during Pachyderm deployment

When creating an instance of the Pachyderm operator, a webhook error appears intermittently, preventing the creation process from starting successfully. The webhook error is indicative that, either the Pachyderm operator failed a health check, causing it to restart, or that the operator process exceeded its container’s allocated memory limit, triggering an Out of Memory (OOM) kill.

Workaround
Repeat the Pachyderm instance creation process until the error no longer appears.

RHODS-2096 - IBM Watson Studio not available in OpenShift Data Science

IBM Watson Studio is not available when OpenShift Data Science is installed on OpenShift Dedicated 4.9 or higher, because it is not compatible with these versions of OpenShift Dedicated. Contact Marketplace support for assistance manually configuring Watson Studio on OpenShift Dedicated 4.9 and higher.

RHODS-1888 - OpenShift Data Science hyperlink still visible after uninstall

When the OpenShift Data Science Add-on is uninstalled from an OpenShift Dedicated cluster, the link to the OpenShift Data Science interface remains visible in the application launcher menu. Clicking this link results in a "Page Not Found" error because OpenShift Data Science is no longer available.

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.