このコンテンツは選択した言語では利用できません。
Chapter 3. Running Ray-based distributed workloads
In OpenShift AI, you can run a Ray-based distributed workload from a Jupyter notebook or from a pipeline.
You can run Ray-based distributed workloads in a disconnected environment if you can access all of the required software from that environment. For example, you must be able to access a Ray cluster image, and the data sets and Python dependencies used by the workload, from the disconnected environment.
3.1. Running distributed data science workloads from Jupyter notebooks リンクのコピーリンクがクリップボードにコピーされました!
To run a distributed workload from a Jupyter notebook, you must configure a Ray cluster. You must also provide environment-specific information such as cluster authentication details.
The examples in this section refer to the JupyterLab integrated development environment (IDE).
3.1.1. Downloading the demo Jupyter notebooks from the CodeFlare SDK リンクのコピーリンクがクリップボードにコピーされました!
The demo Jupyter notebooks from the CodeFlare SDK provide guidelines on how to use the CodeFlare stack in your own Jupyter notebooks. Download the demo Jupyter notebooks so that you can learn how to run Jupyter notebooks locally.
Prerequisites
- You can access a data science cluster that is configured to run distributed workloads as described in Managing distributed workloads.
- You can access a data science project that contains a workbench, and the workbench is running a default workbench image that contains the CodeFlare SDK, for example, the Standard Data Science notebook. For information about projects and workbenches, see Working on data science projects.
You have administrator access for the data science project.
- If you created the project, you automatically have administrator access.
- If you did not create the project, your cluster administrator must give you administrator access.
- You have logged in to Red Hat OpenShift AI, started your workbench, and logged in to JupyterLab.
Procedure
In the JupyterLab interface, click File > New > Notebook. Specify your preferred Python version, and then click Select.
A new Jupyter notebook file is created with the
.ipynbfile name extension.Add the following code to a cell in the new notebook:
Code to download the demo Jupyter notebooks
from codeflare_sdk import copy_demo_nbs copy_demo_nbs()
from codeflare_sdk import copy_demo_nbs copy_demo_nbs()Copy to Clipboard Copied! Toggle word wrap Toggle overflow Select the cell, and click Run > Run selected cell.
After a few seconds, the
copy_demo_nbs()function copies the demo Jupyter notebooks that are packaged with the currently installed version of the CodeFlare SDK, and clones them into thedemo-notebooksfolder.- In the left navigation pane, right-click the new notebook and click Delete.
- Click Delete to confirm.
Verification
Locate the downloaded demo Jupyter notebooks in the JupyterLab interface, as follows:
- In the left navigation pane, double-click demo-notebooks.
- Double-click additional-demos and verify that the folder contains several demo Jupyter notebooks.
- Click demo-notebooks.
- Double-click guided-demos and verify that the folder contains several demo Jupyter notebooks.
You can run these demo Jupyter notebooks as described in Running the demo Jupyter notebooks from the CodeFlare SDK.
3.1.2. Running the demo Jupyter notebooks from the CodeFlare SDK リンクのコピーリンクがクリップボードにコピーされました!
To run the demo Jupyter notebooks from the CodeFlare SDK, you must provide environment-specific information.
In the examples in this procedure, you edit the demo Jupyter notebooks in JupyterLab to provide the required information, and then run the Jupyter notebooks.
Prerequisites
- You can access a data science cluster that is configured to run distributed workloads as described in Managing distributed workloads.
You can access the following software from your data science cluster:
- A Ray cluster image that is compatible with your hardware architecture
- The data sets and models to be used by the workload
- The Python dependencies for the workload, either in a Ray image or in your own Python Package Index (PyPI) server
- You can access a data science project that contains a workbench, and the workbench is running a default workbench image that contains the CodeFlare SDK, for example, the Standard Data Science workbench. For information about projects and workbenches, see Working on data science projects.
You have administrator access for the data science project.
- If you created the project, you automatically have administrator access.
- If you did not create the project, your cluster administrator must give you administrator access.
- You have logged in to Red Hat OpenShift AI, started your workbench, and logged in to JupyterLab.
- You have downloaded the demo Jupyter notebooks provided by the CodeFlare SDK, as described in Downloading the demo Jupyter notebooks from the CodeFlare SDK.
Procedure
Check whether your cluster administrator has defined a default local queue for the Ray cluster.
You can use the
codeflare_sdk.list_local_queues()function to view all local queues in your current namespace, and the resource flavors associated with each local queue.Alternatively, you can use the OpenShift web console as follows:
- In the OpenShift web console, select your project from the Project list.
Click Search, and from the Resources list, select LocalQueue to show the list of local queues for your project.
If no local queue is listed, contact your cluster administrator.
Review the details of each local queue:
- Click the local queue name.
Click the YAML tab, and review the
metadata.annotationssection.If the
kueue.x-k8s.io/default-queueannotation is set to'true', the queue is configured as the default local queue.NoteIf your cluster administrator does not define a default local queue, you must specify a local queue in each Jupyter notebook.
- In the JupyterLab interface, open the demo-notebooks > guided-demos folder.
Open all of the Jupyter notebooks by double-clicking each Jupyter notebook file.
Jupyter notebook files have the
.ipynbfile name extension.In each Jupyter notebook, ensure that the
importsection imports the required components from the CodeFlare SDK, as follows:Example import section
from codeflare_sdk import Cluster, ClusterConfiguration, TokenAuthentication
from codeflare_sdk import Cluster, ClusterConfiguration, TokenAuthenticationCopy to Clipboard Copied! Toggle word wrap Toggle overflow In each Jupyter notebook, update the
TokenAuthenticationsection to provide thetokenandserverdetails to authenticate to the OpenShift cluster by using the CodeFlare SDK.For information about how to find the server and token details, see Using the cluster server and token to authenticate.
Optional: If you want to use custom certificates, update the
TokenAuthenticationsection to add theca_cert_pathparameter to specify the location of the custom certificates, as shown in the following example:Example authentication section
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Alternatively, you can set the
CF_SDK_CA_CERT_PATHenvironment variable to specify the location of the custom certificates.In each Jupyter notebook, update the cluster configuration section as follows:
If the
namespacevalue is specified, replace the example value with the name of your project.If you omit this line, the Ray cluster is created in the current project.
If the
imagevalue is specified, replace the example value with a link to a suitable Ray cluster image. The Python version in the Ray cluster image must be the same as the Python version in the workbench.If you omit this line, one of the following Ray cluster images is used by default, based on the Python version detected in the workbench:
-
Python 3.9:
quay.io/modh/ray:2.35.0-py39-cu121 -
Python 3.11:
quay.io/modh/ray:2.47.1-py311-cu121
The default Ray images are compatible with NVIDIA GPUs that are supported by the specified CUDA version. The default images are AMD64 images, which might not work on other architectures.
Additional ROCm-compatible Ray cluster images are available, which are compatible with AMD accelerators that are supported by the specified ROCm version. These images are AMD64 images, which might not work on other architectures.
For information about the latest available training images and their preinstalled packages, including the CUDA and ROCm versions, see Red Hat OpenShift AI: Supported Configurations.
-
Python 3.9:
If your cluster administrator has not configured a default local queue, specify the local queue for the Ray cluster, as shown in the following example:
Example local queue assignment
local_queue="your_local_queue_name"
local_queue="your_local_queue_name"Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Assign a dictionary of
labelsparameters to the Ray cluster for identification and management purposes, as shown in the following example:Example labels assignment
labels = {"exampleLabel1": "exampleLabel1Value", "exampleLabel2": "exampleLabel2Value"}labels = {"exampleLabel1": "exampleLabel1Value", "exampleLabel2": "exampleLabel2Value"}Copy to Clipboard Copied! Toggle word wrap Toggle overflow
In the
2_basic_interactive.ipynbJupyter notebook, ensure that the following Ray cluster authentication code is included after the Ray cluster creation section:Ray cluster authentication code
from codeflare_sdk import generate_cert generate_cert.generate_tls_cert(cluster.config.name, cluster.config.namespace) generate_cert.export_env(cluster.config.name, cluster.config.namespace)
from codeflare_sdk import generate_cert generate_cert.generate_tls_cert(cluster.config.name, cluster.config.namespace) generate_cert.export_env(cluster.config.name, cluster.config.namespace)Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteMutual Transport Layer Security (mTLS) is enabled by default in the CodeFlare component in OpenShift AI. You must include the Ray cluster authentication code to enable the Ray client that runs within a Jupyter notebook to connect to a secure Ray cluster that has mTLS enabled.
Run the Jupyter notebooks in the order indicated by the file-name prefix (
0_,1_, and so on).- In each Jupyter notebook, run each cell in turn, and review the cell output.
- If an error is shown, review the output to find information about the problem and the required corrective action. For example, replace any deprecated parameters as instructed. See also Troubleshooting common problems with distributed workloads for users.
- For more information about the interactive browser controls that you can use to simplify Ray cluster tasks when working within a Jupyter notebook, see Managing Ray clusters from within a Jupyter notebook.
Verification
- The Jupyter notebooks run to completion without errors.
-
In the Jupyter notebooks, the output from the
cluster.status()function orcluster.details()function indicates that the Ray cluster isActive.
3.1.3. Managing Ray clusters from within a Jupyter notebook リンクのコピーリンクがクリップボードにコピーされました!
You can use interactive browser controls to simplify Ray cluster tasks when working within a Jupyter notebook.
The interactive browser controls provide an alternative to the equivalent commands, but do not replace them. You can continue to manage the Ray clusters by running commands within the Jupyter notebook, for ease of use in scripts and pipelines.
Several different interactive browser controls are available:
- When you run a cell that provides the cluster configuration, the Jupyter notebook automatically shows the controls for starting or deleting the cluster.
You can run the
view_clusters()command to add controls that provide the following functionality:- View a list of the Ray clusters that you can access.
- View cluster information, such as cluster status and allocated resources, for the selected Ray cluster. You can view this information from within the Jupyter notebook, without switching to the OpenShift console or the Ray dashboard.
- Open the Ray dashboard directly from the Jupyter notebook, to view the submitted jobs.
- Refresh the Ray cluster list and the cluster information for the selected cluster.
You can add these controls to existing Jupyter notebooks, or manage the Ray clusters from a separate Jupyter notebook.
The 3_widget_example.ipynb demo Jupyter notebook shows all of the available interactive browser controls. In the example in this procedure, you create a new Jupyter notebook to manage the Ray clusters, similar to the example provided in the 3_widget_example.ipynb demo Jupyter notebook.
Prerequisites
- You can access a data science cluster that is configured to run distributed workloads as described in Managing distributed workloads.
You can access the following software from your data science cluster:
- A Ray cluster image that is compatible with your hardware architecture
- The data sets and models to be used by the workload
- The Python dependencies for the workload, either in a Ray image or in your own Python Package Index (PyPI) server
- You can access a data science project that contains a workbench, and the workbench is running a default workbench image that contains the CodeFlare SDK, for example, the Standard Data Science workbench. For information about projects and workbenches, see Working on data science projects.
You have administrator access for the data science project.
- If you created the project, you automatically have administrator access.
- If you did not create the project, your cluster administrator must give you administrator access.
- You have logged in to Red Hat OpenShift AI, started your workbench, and logged in to JupyterLab.
- You have downloaded the demo Jupyter notebooks provided by the CodeFlare SDK, as described in Downloading the demo Jupyter notebooks from the CodeFlare SDK.
Procedure
-
Run all of the demo Jupyter notebooks in the order indicated by the file-name prefix (
0_,1_, and so on), as described in Running the demo Jupyter notebooks from the CodeFlare SDK. In each demo Jupyter notebook, when you run the cluster configuration step, the following interactive controls are automatically shown in the Jupyter notebook:
-
Cluster Up: You can click this button to start the Ray cluster. This button is equivalent to the
cluster.up()command. When you click this button, a message indicates whether the cluster was successfully created. -
Cluster Down: You can click this button to delete the Ray cluster. This button is equivalent to the
cluster.down()command. The cluster is deleted immediately; you are not prompted to confirm the deletion. When you click this button, a message indicates whether the cluster was successfully deleted. -
Wait for Cluster: You can select this option to specify that the notebook cell should wait for the Ray cluster dashboard to be ready before proceeding to the next step. This option is equivalent to the
cluster.wait_ready()command.
-
Cluster Up: You can click this button to start the Ray cluster. This button is equivalent to the
In the JupyterLab interface, create a new Jupyter notebook to manage the Ray clusters, as follows:
Click File > New > Notebook. Specify your preferred Python version, and then click Select.
A new Jupyter notebook file is created with the
.ipynbfile name extension.Add the following code to a cell in the new Jupyter notebook:
Code to import the required packages
from codeflare_sdk import TokenAuthentication, view_clusters
from codeflare_sdk import TokenAuthentication, view_clustersCopy to Clipboard Copied! Toggle word wrap Toggle overflow The
view_clusterspackage provides the interactive browser controls for listing the clusters, showing the cluster details, opening the Ray dashboard, and refreshing the cluster data.Add a new notebook cell, and add the following code to the new cell:
Code to authenticate
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For information about how to find the token and server values, see Running the demo Jupyter notebooks from the CodeFlare SDK.
Add a new notebook cell, and add the following code to the new cell:
Code to view clusters in the current project
view_clusters()
view_clusters()Copy to Clipboard Copied! Toggle word wrap Toggle overflow When you run the
view_clusters()command with no arguments specified, you generate a list of all of the Ray clusters in the current project, and display information similar to thecluster.details()function.If you have access to another project, you can list the Ray clusters in that project by specifying the project name as shown in the following example:
Code to view clusters in another project
view_clusters("my_second_project")view_clusters("my_second_project")Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Click File > Save Notebook As, enter
demo-notebooks/guided-demos/manage_ray_clusters.ipynb, and click Save.
-
In the
demo-notebooks/guided-demos/manage_ray_clusters.ipynbJupyter notebook, select each cell in turn, and click Run > Run selected cell. When you run the cell with the
view_clusters()function, the output depends on whether any Ray clusters exist.If no Ray clusters exist, the following text is shown, where
_[project-name]_is the name of the target project:No clusters found in the [project-name] namespace.
No clusters found in the [project-name] namespace.Copy to Clipboard Copied! Toggle word wrap Toggle overflow Otherwise, the Jupyter notebook shows the following information about the existing Ray clusters:
Select an existing cluster
Under this heading, a toggle button is shown for each existing cluster. Click a cluster name to select the cluster. The cluster details section is updated to show details about the selected cluster; for example, cluster name, OpenShift AI project name, cluster resource information, and cluster status.
Delete cluster
Click this button to delete the selected cluster. This button is equivalent to the Cluster Down button. The cluster is deleted immediately; you are not prompted to confirm the deletion. A message indicates whether the cluster was successfully deleted, and the corresponding button is no longer shown under the Select an existing cluster heading.
View Jobs
Click this button to open the Jobs tab in the Ray dashboard for the selected cluster, and view details of the submitted jobs. The corresponding URL is shown in the Jupyter notebook.
Open Ray Dashboard
Click this button to open the Overview tab in the Ray dashboard for the selected cluster. The corresponding URL is shown in the Jupyter notebook.
Refresh Data
Click this button to refresh the list of Ray clusters, and the cluster details for the selected cluster, on demand. The cluster details are automatically refreshed when you select a cluster and when you delete the selected cluster.
Verification
- The demo Jupyter notebooks run to completion without errors.
-
In the
manage_ray_clusters.ipynbJupyter notebook, the output from theview_clusters()function is correct.
3.2. Running distributed data science workloads from data science pipelines リンクのコピーリンクがクリップボードにコピーされました!
To run a distributed workload from a pipeline, you must first update the pipeline to include a link to your Ray cluster image.
Prerequisites
- You can access a data science cluster that is configured to run distributed workloads as described in Managing distributed workloads.
You can access the following software from your data science cluster:
- A Ray cluster image that is compatible with your hardware architecture
- The data sets and models to be used by the workload
- The Python dependencies for the workload, either in a Ray image or in your own Python Package Index (PyPI) server
- You can access a data science project that contains a workbench, and the workbench is running a default workbench image that contains the CodeFlare SDK, for example, the Standard Data Science workbench. For information about projects and workbenches, see Working on data science projects.
You have administrator access for the data science project.
- If you created the project, you automatically have administrator access.
- If you did not create the project, your cluster administrator must give you administrator access.
- You have access to S3-compatible object storage.
- You have logged in to Red Hat OpenShift AI.
Procedure
- Create a connection to connect the object storage to your data science project, as described in Adding a connection to your data science project.
- Configure a pipeline server to use the connection, as described in Configuring a pipeline server.
Create the data science pipeline as follows:
Install the
kfpPython package, which is required for all pipelines:pip install kfp
$ pip install kfpCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Install any other dependencies that are required for your pipeline.
Build your data science pipeline in Python code.
For example, create a file named
compile_example.pywith the following content.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Imports Ray.
- 2
- Imports packages from the CodeFlare SDK to define the cluster functions.
- 3
- Specifies the Ray cluster configuration: replace these example values with the values for your Ray cluster.
- 4
- Optional: Specifies the project where the Ray cluster is created. Replace the example value with the name of your project. If you omit this line, the Ray cluster is created in the current project.
- 5
- Optional: Specifies the requested accelerators for the Ray cluster (in this example, 1 NVIDIA GPU). If you do not use NVIDIA GPUs, replace
nvidia.com/gpuwith the correct value for your accelerator; for example, specifyamd.com/gpufor AMD GPUs. If no accelerators are required, set the value to 0 or omit the line. - 6
- Specifies the location of the Ray cluster image. The Python version in the Ray cluster image must be the same as the Python version in the workbench. If you omit this line, one of the default CUDA-compatible Ray cluster images is used, based on the Python version detected in the workbench. The default Ray images are AMD64 images, which might not work on other architectures. If you are running this code in a disconnected environment, replace the default value with the location for your environment. For information about the latest available training images and their preinstalled packages, see Red Hat OpenShift AI: Supported Configurations.
- 7
- Specifies the local queue to which the Ray cluster will be submitted. If a default local queue is configured, you can omit this line.
- 8
- Creates a Ray cluster by using the specified image and configuration.
- 9
- Waits until the Ray cluster is ready before proceeding.
- 10
- Enables the Ray client to connect to a secure Ray cluster that has mutual Transport Layer Security (mTLS) enabled. mTLS is enabled by default in the CodeFlare component in OpenShift AI.
- 11
- Replace the example details in this section with the details for your workload.
- 12
- Removes the Ray cluster when your workload is finished.
- 13
- Replace the example name and description with the values for your workload.
- 14
- Compiles the Python code and saves the output in a YAML file.
Compile the Python file (in this example, the
compile_example.pyfile):python compile_example.py
$ python compile_example.pyCopy to Clipboard Copied! Toggle word wrap Toggle overflow This command creates a YAML file (in this example,
compiled-example.yaml), which you can import in the next step.
- Import your data science pipeline, as described in Importing a data science pipeline.
- Schedule the pipeline run, as described in Scheduling a pipeline run.
- When the pipeline run is complete, confirm that it is included in the list of triggered pipeline runs, as described in Viewing the details of a pipeline run.
Verification
The YAML file is created and the pipeline run completes without errors.
You can view the run details, as described in Viewing the details of a pipeline run.
3.3. Running distributed data science workloads in a disconnected environment リンクのコピーリンクがクリップボードにコピーされました!
To run a distributed data science workload in a disconnected environment, you must be able to access a Ray cluster image, and the data sets and Python dependencies used by the workload, from the disconnected environment.
Prerequisites
-
You have logged in to OpenShift with the
cluster-adminrole. - You have access to the disconnected data science cluster.
- You have installed Red Hat OpenShift AI and created a mirror image as described in Installing and uninstalling OpenShift AI Self-Managed in a disconnected environment.
You can access the following software from the disconnected cluster:
- A Ray cluster image
- The data sets and models to be used by the workload
- The Python dependencies for the workload, either in a Ray image or in your own Python Package Index (PyPI) server that is available from the disconnected cluster
- You have logged in to Red Hat OpenShift AI.
- You have created a data science project that contains a workbench, and the workbench is running a default workbench image that contains the CodeFlare SDK, for example, the Standard Data Science workbench. For information about how to create a project, see Creating a data science project.
You have administrator access for the data science project.
- If you created the project, you automatically have administrator access.
- If you did not create the project, your cluster administrator must give you administrator access.
Procedure
- Configure the disconnected data science cluster to run distributed workloads as described in Managing distributed workloads.
In the
ClusterConfigurationsection of the Jupyter notebook or pipeline, ensure that theimagevalue specifies a Ray cluster image that you can access from the disconnected environment:- Jupyter notebooks use the Ray cluster image to create a Ray cluster when running the notebook cells.
- Pipelines use the Ray cluster image to create a Ray cluster during the pipeline run.
If any of the Python packages required by the workload are not available in the Ray cluster, configure the Ray cluster to download the Python packages from a private PyPI server.
For example, set the
PIP_INDEX_URLandPIP_TRUSTED_HOSTenvironment variables for the Ray cluster, to specify the location of the Python dependencies, as shown in the following example:PIP_INDEX_URL: https://pypi-notebook.apps.mylocation.com/simple PIP_TRUSTED_HOST: pypi-notebook.apps.mylocation.com
PIP_INDEX_URL: https://pypi-notebook.apps.mylocation.com/simple PIP_TRUSTED_HOST: pypi-notebook.apps.mylocation.comCopy to Clipboard Copied! Toggle word wrap Toggle overflow where
-
PIP_INDEX_URLspecifies the base URL of your private PyPI server (the default value is https://pypi.org). -
PIP_TRUSTED_HOSTconfigures Python to mark the specified host as trusted, regardless of whether that host has a valid SSL certificate or is using a secure channel.
-
- Run the distributed data science workload, as described in Running distributed data science workloads from Jupyter notebooks or Running distributed data science workloads from data science pipelines.
Verification
The Jupyter notebook or pipeline run completes without errors:
-
For Jupyter notebooks, the output from the
cluster.status()function orcluster.details()function indicates that the Ray cluster isActive. - For pipeline runs, you can view the run details as described in Viewing the details of a pipeline run.