Chapter 5. Managing Jupyter notebook servers
5.1. Accessing the Jupyter administration interface
You can use the Jupyter administration interface to control notebook servers in your Red Hat OpenShift AI environment.
Prerequisite
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
Procedure
To access the Jupyter administration interface from OpenShift AI, perform the following actions:
- In OpenShift AI, in the Applications section of the left menu, click Enabled.
- Locate the Jupyter tile and click Launch application.
On the page that opens when you launch Jupyter, click the Administration tab.
The Administration page opens.
To access the Jupyter administration interface from JupyterLab, perform the following actions:
-
Click File
Hub Control Panel. On the page that opens in OpenShift AI, click the Administration tab.
The Administration page opens.
-
Click File
Verification
- You can see the Jupyter administration interface.
5.2. Starting notebook servers owned by other users
OpenShift AI administrators can start a notebook server for another existing user from the Jupyter administration interface.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
- You have launched the Jupyter application, as described in Starting a Jupyter notebook server.
Procedure
- On the page that opens when you launch Jupyter, click the Administration tab.
On the Administration tab, perform the following actions:
- In the Users section, locate the user whose notebook server you want to start.
- Click Start server beside the relevant user.
- Complete the Start a notebook server page.
- Optional: Select the Start server in current tab checkbox if necessary.
Click Start server.
After the server starts, you see one of the following behaviors:
- If you previously selected the Start server in current tab checkbox, the JupyterLab interface opens in the current tab of your web browser.
If you did not previously select the Start server in current tab checkbox, the Starting server dialog box prompts you to open the server in a new browser tab or in the current tab.
The JupyterLab interface opens according to your selection.
Verification
- The JupyterLab interface opens.
5.3. Accessing notebook servers owned by other users
OpenShift AI administrators can access notebook servers that are owned by other users to correct configuration errors or to help them troubleshoot problems with their environment.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
- You have launched the Jupyter application, as described in Starting a Jupyter notebook server.
- The notebook server that you want to access is running.
Procedure
- On the page that opens when you launch Jupyter, click the Administration tab.
On the Administration page, perform the following actions:
- In the Users section, locate the user that the notebook server belongs to.
- Click View server beside the relevant user.
- On the Notebook server control panel page, click Access notebook server.
Verification
- The user’s notebook server opens in JupyterLab.
5.4. Stopping notebook servers owned by other users
OpenShift AI administrators can stop notebook servers that are owned by other users to reduce resource consumption on the cluster, or as part of removing a user and their resources from the cluster.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
- You have launched the Jupyter application, as described in Starting a Jupyter notebook server.
- The notebook server that you want to stop is running.
Procedure
- On the page that opens when you launch Jupyter, click the Administration tab.
Stop one or more servers.
If you want to stop one or more specific servers, perform the following actions:
- In the Users section, locate the user that the notebook server belongs to.
To stop the notebook server, perform one of the following actions:
- Click the action menu (⋮) beside the relevant user and select Stop server.
Click View server beside the relevant user and then click Stop notebook server.
The Stop server dialog box appears.
- Click Stop server.
If you want to stop all servers, perform the following actions:
- Click the Stop all servers button.
- Click OK to confirm stopping all servers.
Verification
- The Stop server link beside each server changes to a Start server link when the notebook server has stopped.
5.5. Stopping idle notebooks
You can reduce resource usage in your OpenShift AI deployment by stopping notebook servers that have been idle (without logged in users) for a period of time. This is useful when resource demand in the cluster is high. By default, idle notebooks are not stopped after a specific time limit.
If you have configured your cluster settings to disconnect all users from a cluster after a specified time limit, then this setting takes precedence over the idle notebook time limit. Users are logged out of the cluster when their session duration reaches the cluster-wide time limit.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
Procedure
-
From the OpenShift AI dashboard, click Settings
Cluster settings. - Under Stop idle notebooks, select Stop idle notebooks after.
- Enter a time limit, in hours and minutes, for when idle notebooks are stopped.
- Click Save changes.
Verification
The
notebook-controller-culler-config
ConfigMap, located in theredhat-ods-applications
project on the WorkloadsConfigMaps page, contains the following culling configuration settings: -
ENABLE_CULLING
: Specifies if the culling feature is enabled or disabled (this isfalse
by default). -
IDLENESS_CHECK_PERIOD
: The polling frequency to check for a notebook’s last known activity (in minutes). -
CULL_IDLE_TIME
: The maximum allotted time to scale an inactive notebook to zero (in minutes).
-
- Idle notebooks stop at the time limit that you set.
5.6. Adding notebook pod tolerations
If you want to dedicate certain machine pools to only running notebook pods, you can allow notebook pods to be scheduled on specific nodes by adding a toleration. Taints and tolerations allow a node to control which pods should (or should not) be scheduled on them. For more information, see Understanding taints and tolerations.
This capability is useful if you want to make sure that notebook servers are placed on nodes that can handle their needs. By preventing other workloads from running on these specific nodes, you can ensure that the necessary resources are available to users who need to work with large notebook sizes.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
- You are familiar with OpenShift taints and tolerations, as described in Understanding taints and tolerations.
Procedure
-
From the OpenShift AI dashboard, click Settings
Cluster settings. - Under Notebook pod tolerations, select Add a toleration to notebook pods to allow them to be scheduled to tainted nodes.
-
In the Toleration key for notebook pods field, enter a toleration key. The key is any string, up to 253 characters. The key must begin with a letter or number, and can contain letters, numbers, hyphens, dots, and underscores. For example,
notebooks-only
. Click Save changes. The toleration key is applied to new notebook pods when they are created.
For existing notebook pods, the toleration key is applied when the notebook pods are restarted.
If you are using Jupyter, see Updating notebook server settings by restarting your server. If you are using a workbench in a data science project, see Starting a workbench.
Next step
In OpenShift, add a matching taint key (with any value) to the machine pools that you want to dedicate to notebooks. For more information, see Controlling pod placement using node taints.
For more information, see Adding taints to a machine pool.
Verification
-
In the OpenShift console, for a pod that is running, click Workloads
Pods. Otherwise, for a pod that is stopped, click Workloads StatefulSet. - Search for your workbench pod name and then click the name to open the pod details page.
- Confirm that the assigned Node and Tolerations are correct.
5.7. Troubleshooting common problems in Jupyter for administrators
If your users are experiencing errors in Red Hat OpenShift AI relating to Jupyter, their notebooks, or their notebook server, read this section to understand what could be causing the problem, and how to resolve the problem.
If you cannot see the problem here or in the release notes, contact Red Hat Support.
5.7.1. A user receives a 404: Page not found error when logging in to Jupyter
Problem
If you have configured OpenShift AI user groups, the user name might not be added to the default user group for OpenShift AI.
Diagnosis
Check whether the user is part of the default user group.
Find the names of groups allowed access to Jupyter.
- Log in to the OpenShift web console.
-
Click User Management
Groups. Click the name of your user group, for example,
rhoai-users
.The Group details page for that group appears.
- Click the Details tab for the group and confirm that the Users section for the relevant group contains the users who have permission to access Jupyter.
Resolution
- If the user is not added to any of the groups with permission access to Jupyter, follow Adding users to OpenShift AI user groups to add them.
- If the user is already added to a group with permission to access Jupyter, contact Red Hat Support.
5.7.2. A user’s notebook server does not start
Problem
The OpenShift cluster that hosts the user’s notebook server might not have access to enough resources, or the Jupyter pod may have failed.
Diagnosis
- Log in to the OpenShift web console.
Delete and restart the notebook server pod for this user.
-
Click Workloads
Pods and set the Project to rhods-notebooks
. Search for the notebook server pod that belongs to this user, for example,
jupyter-nb-<username>-*
.If the notebook server pod exists, an intermittent failure may have occurred in the notebook server pod.
If the notebook server pod for the user does not exist, continue with diagnosis.
-
Click Workloads
Check the resources currently available in the OpenShift cluster against the resources required by the selected notebook server image.
If worker nodes with sufficient CPU and RAM are available for scheduling in the cluster, continue with diagnosis.
- Check the state of the Jupyter pod.
Resolution
If there was an intermittent failure of the notebook server pod:
- Delete the notebook server pod that belongs to the user.
- Ask the user to start their notebook server again.
- If the notebook server does not have sufficient resources to run the selected notebook server image, either add more resources to the OpenShift cluster, or choose a smaller image size.
If the Jupyter pod is in a FAILED state:
-
Retrieve the logs for the
jupyter-nb-*
pod and send them to Red Hat Support for further evaluation. -
Delete the
jupyter-nb-*
pod.
-
Retrieve the logs for the
- If none of the previous resolutions apply, contact Red Hat Support.
5.7.3. The user receives a database or disk is full error or a no space left on device error when they run notebook cells
Problem
The user might have run out of storage space on their notebook server.
Diagnosis
Log in to Jupyter and start the notebook server that belongs to the user having problems. If the notebook server does not start, follow these steps to check whether the user has run out of storage space:
- Log in to the OpenShift web console.
-
Click Workloads
Pods and set the Project to rhods-notebooks
. -
Click the notebook server pod that belongs to this user, for example,
jupyter-nb-<idp>-<username>-*
. Click Logs. The user has exceeded their available capacity if you see lines similar to the following:
Unexpected error while saving file: XXXX database or disk is full
Resolution
- Increase the user’s available storage by expanding their persistent volume: Expanding persistent volumes
-
Work with the user to identify files that can be deleted from the
/opt/app-root/src
directory on their notebook server to free up their existing storage space.
When you delete files using the JupyterLab file explorer, the files move to the hidden /opt/app-root/src/.local/share/Trash/files
folder in the persistent storage for the notebook. To free up storage space for notebooks, you must permanently delete these files.