Chapter 6. Managing basic workbenches
6.1. Accessing the administration interface for basic workbenches
You can use the administration interface to control basic workbenches in your Red Hat OpenShift AI environment.
Prerequisite
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
Procedure
To access the administration interface for basic workbenches from OpenShift AI, perform the following actions:
- In OpenShift AI, in the Applications section of the left menu, click Enabled.
- Locate the Start basic workbench tile and click Launch application.
On the page that opens when you launch a basic workbench, click the Administration tab.
The Administration page opens.
To access the administration interface for basic workbenches from JupyterLab, perform the following actions:
-
Click File
Hub Control Panel. On the page that opens in OpenShift AI, click the Administration tab.
The Administration page opens.
-
Click File
Verification
- You can see the administration interface for basic workbenches.
6.2. Starting basic workbenches owned by other users
OpenShift AI administrators can start a basic workbench for another existing user from the administration interface for basic workbenches.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
- You have launched the Start basic workbench application, as described in Starting a basic workbench.
Procedure
- On the page that opens when you launch a basic workbench, click the Administration tab.
On the Administration tab, perform the following actions:
- In the Users section, locate the user whose workbench you want to start.
- Click Start workbench beside the relevant user.
- Complete the Start a basic workbench page.
- Optional: Select the Start workbench in current tab checkbox if necessary.
Click Start workbench.
After the server starts, you see one of the following behaviors:
- If you previously selected the Start workbench in current tab checkbox, the JupyterLab interface opens in the current tab of your web browser.
If you did not previously select the Start workbench in current tab checkbox, the Workbench status dialog box prompts you to open the server in a new browser tab or in the current tab.
The JupyterLab interface opens according to your selection.
Verification
- The JupyterLab interface opens.
6.3. Accessing basic workbenches owned by other users
OpenShift AI administrators can access basic workbenches that are owned by other users to correct configuration errors or to help them troubleshoot problems with their environment.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
- You have launched the Start basic workbench application, as described in Starting a basic workbench.
- The workbench that you want to access is running.
Procedure
- On the page that opens when you launch a basic workbench, click the Administration tab.
On the Administration page, perform the following actions:
- In the Users section, locate the user that the workbench belongs to.
- Click View server beside the relevant user.
- On the Workbench control panel page, click Access workbench.
Verification
- The JupyterLab interface opens in the user’s workbench.
6.4. Stopping basic workbenches owned by other users
OpenShift AI administrators can stop basic workbenches that are owned by other users to reduce resource consumption on the cluster, or as part of removing a user and their resources from the cluster.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
- You have launched the Start basic workbench application, as described in Starting a basic workbench.
- The workbench that you want to stop is running.
Procedure
- On the page that opens when you launch a basic workbench, click the Administration tab.
Stop one or more servers.
If you want to stop one or more specific servers, perform the following actions:
- In the Users section, locate the user that the workbench belongs to.
To stop the workbench, perform one of the following actions:
- Click the action menu (⋮) beside the relevant user and select Stop server.
Click View server beside the relevant user and then click Stop workbench.
The Stop server dialog box appears.
- Click Stop server.
If you want to stop all workbenches, perform the following actions:
- Click the Stop all workbenches button.
- Click OK to confirm stopping all servers.
Verification
- The Stop server link beside each server changes to a Start workbench link when the workbench has stopped.
6.5. Stopping idle workbenches
You can reduce resource usage in your OpenShift AI deployment by stopping workbenches that have been idle (without logged in users) for a period of time. This is useful when resource demand in the cluster is high. By default, idle workbenches are not stopped after a specific time limit.
If you have configured your cluster settings to disconnect all users from a cluster after a specified time limit, then this setting takes precedence over the idle workbench time limit. Users are logged out of the cluster when their session duration reaches the cluster-wide time limit.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
Procedure
-
From the OpenShift AI dashboard, click Settings
Cluster settings. - Under Stop idle workbenches, select Stop idle workbenches after.
- Enter a time limit, in hours and minutes, for when idle workbenches are stopped.
- Click Save changes.
Verification
The
notebook-controller-culler-config
ConfigMap, located in theredhat-ods-applications
project on the WorkloadsConfigMaps page, contains the following culling configuration settings: -
ENABLE_CULLING
: Specifies if the culling feature is enabled or disabled (this isfalse
by default). -
IDLENESS_CHECK_PERIOD
: The polling frequency to check for a notebook’s last known activity (in minutes). -
CULL_IDLE_TIME
: The maximum allotted time to scale an inactive notebook to zero (in minutes).
-
- Idle workbenches stop at the time limit that you set.
6.6. Adding workbench pod tolerations
If you want to dedicate certain machine pools to only running workbench pods, you can allow workbench pods to be scheduled on specific nodes by adding a toleration. Taints and tolerations allow a node to control which pods should (or should not) be scheduled on them. For more information, see Understanding taints and tolerations.
This capability is useful if you want to make sure that workbenches are placed on nodes that can handle their needs. By preventing other workloads from running on these specific nodes, you can ensure that the necessary resources are available to users who need to work with large workbench sizes.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
- You are familiar with OpenShift taints and tolerations, as described in Understanding taints and tolerations.
Procedure
-
From the OpenShift AI dashboard, click Settings
Cluster settings. - Under Workbench pod tolerations, select Add a toleration to workbench pods to allow them to be scheduled to tainted nodes.
-
In the Toleration key for workbench pods field, enter a toleration key. The key is any string, up to 253 characters. The key must begin with a letter or number, and can contain letters, numbers, hyphens, dots, and underscores. For example,
workbenches-only
. Click Save changes. The toleration key is applied to new workbench pods when they are created.
For existing workbench pods, the toleration key is applied when the workbench pods are restarted.
If you are using a basic workbench, see Updating workbench settings by restarting your workbench. If you are using a workbench in a data science project, see Starting a workbench.
Next step
In OpenShift, add a matching taint key (with any value) to the machine pools that you want to dedicate to workbenches. For more information, see Controlling pod placement using node taints.
For more information, see Adding taints to a machine pool.
Verification
-
In the OpenShift console, for a pod that is running, click Workloads
Pods. Otherwise, for a pod that is stopped, click Workloads StatefulSet. - Search for your workbench pod name and then click the name to open the pod details page.
- Confirm that the assigned Node and Tolerations are correct.
6.7. Troubleshooting common problems in workbenches for administrators
If your users are experiencing errors in Red Hat OpenShift AI relating to Jupyter, their Jupyter notebooks, or their workbench, read this section to understand what could be causing the problem, and how to resolve the problem.
If you cannot see the problem here or in the release notes, contact Red Hat Support.
6.7.1. A user receives a 404: Page not found error when logging in to Jupyter
Problem
If you have configured OpenShift AI user groups, the user name might not be added to the default user group for OpenShift AI.
Diagnosis
Check whether the user is part of the default user group.
Find the names of groups allowed access to Jupyter.
- Log in to the OpenShift web console.
-
Click User Management
Groups. Click the name of your user group, for example,
rhoai-users
.The Group details page for that group appears.
- Click the Details tab for the group and confirm that the Users section for the relevant group contains the users who have permission to access Jupyter.
Resolution
- If the user is not added to any of the groups with permission access to Jupyter, follow Adding users to OpenShift AI user groups to add them.
- If the user is already added to a group with permission to access Jupyter, contact Red Hat Support.
6.7.2. A user’s workbench does not start
Problem
The OpenShift cluster that hosts the user’s workbench might not have access to enough resources, or the workbench pod may have failed.
Diagnosis
- Log in to the OpenShift web console.
Delete and restart the workbench pod for this user.
-
Click Workloads
Pods and set the Project to rhods-notebooks
. Search for the workbench pod that belongs to this user, for example,
jupyter-nb-<username>-*
.If the workbench pod exists, an intermittent failure may have occurred in the workbench pod.
If the workbench pod for the user does not exist, continue with diagnosis.
-
Click Workloads
Check the resources currently available in the OpenShift cluster against the resources required by the selected workbench image.
If worker nodes with sufficient CPU and RAM are available for scheduling in the cluster, continue with diagnosis.
- Check the state of the workbench pod.
Resolution
If there was an intermittent failure of the workbench pod:
- Delete the workbench pod that belongs to the user.
- Ask the user to start their workbench again.
- If the workbench does not have sufficient resources to run the selected workbench image, either add more resources to the OpenShift cluster, or choose a smaller image size.
If the workbench pod is in a FAILED state:
-
Retrieve the logs for the
jupyter-nb-*
pod and send them to Red Hat Support for further evaluation. -
Delete the
jupyter-nb-*
pod.
-
Retrieve the logs for the
- If none of the previous resolutions apply, contact Red Hat Support.
6.7.3. The user receives a database or disk is full error or a no space left on device error when they run notebook cells
Problem
The user might have run out of storage space on their workbench.
Diagnosis
Log in to Jupyter and start the workbench that belongs to the user having problems. If the workbench does not start, follow these steps to check whether the user has run out of storage space:
- Log in to the OpenShift web console.
-
Click Workloads
Pods and set the Project to rhods-notebooks
. -
Click the workbench pod that belongs to this user, for example,
jupyter-nb-<idp>-<username>-*
. Click Logs. The user has exceeded their available capacity if you see lines similar to the following:
Unexpected error while saving file: XXXX database or disk is full
Unexpected error while saving file: XXXX database or disk is full
Copy to Clipboard Copied!
Resolution
- Increase the user’s available storage by expanding their persistent volume: Expanding persistent volumes
-
Work with the user to identify files that can be deleted from the
/opt/app-root/src
directory on their workbench to free up their existing storage space.
When you delete files using the JupyterLab file explorer, the files move to the hidden /opt/app-root/src/.local/share/Trash/files
folder in the persistent storage for the workbench. To free up storage space for workbenches, you must permanently delete these files.