主页
产品
Red Hat OpenShift AI Self-Managed
2.8
使用数据科学项目
6.4. 从数据科学项目运行分布式数据科学工作负载

6.4. 从数据科学项目运行分布式数据科学工作负载

要从数据科学项目管道运行分布式数据科学工作负载，您必须首先更新管道，使其包含到 Ray 集群镜像的链接。

先决条件

已使用 cluster-admin 角色登录到 OpenShift Container Platform。
您可以访问配置为运行分布式工作负载的数据科学项目，如配置分布式工作负载中所述。
已安装 Red Hat OpenShift Pipelines Operator，如安装 OpenShift Pipelines 所述。
您可以访问 S3 兼容对象存储。
您已登陆到 Red Hat OpenShift AI。
您已创建了数据科学项目。

步骤

创建数据连接以将对象存储连接到您的数据科学项目，如向数据科学项目添加数据连接中所述。
将管道服务器配置为使用数据连接，如配置管道服务器中所述。

按如下方式创建数据科学项目管道：

安装 kfp-tekton Python 软件包，这是所有管道都需要的：
```
pip install kfp-tekton
```
```
$ pip install kfp-tekton
```
Copy to Clipboard Toggle word wrap
安装管道所需的任何其他依赖项。

在 Python 代码中构建您的数据科学项目管道。例如，使用以下内容创建一个名为 compile_example.py 的文件：

from kfp import components, dsl


def ray_fn(openshift_server: str, openshift_token: str) -> int: 
   import ray
   from codeflare_sdk.cluster.auth import TokenAuthentication
   from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration


   auth = TokenAuthentication( 
       token=openshift_token, server=openshift_server, skip_tls=True
   )
   auth_return = auth.login()
   cluster = Cluster( 
       ClusterConfiguration(
           name="raytest",
           # namespace must exist
           namespace="pipeline-example",
           num_workers=1,
           head_cpus="500m",
           min_memory=1,
           max_memory=1,
           num_gpus=0,
           image="quay.io/project-codeflare/ray:latest-py39-cu118", 
           instascale=False, 
       )
   )


   print(cluster.status())
   cluster.up() 
   cluster.wait_ready() 
   print(cluster.status())
   print(cluster.details())


   ray_dashboard_uri = cluster.cluster_dashboard_uri()
   ray_cluster_uri = cluster.cluster_uri()
   print(ray_dashboard_uri, ray_cluster_uri)


   # Before proceeding, ensure that the cluster exists and that its URI contains a value
   assert ray_cluster_uri, "Ray cluster must be started and set before proceeding"


   ray.init(address=ray_cluster_uri)
   print("Ray cluster is up and running: ", ray.is_initialized())


   @ray.remote
   def train_fn(): 
       # complex training function
       return 100


   result = ray.get(train_fn.remote())
   assert 100 == result
   ray.shutdown()
   cluster.down() 
   auth.logout()
   return result


@dsl.pipeline( 
   name="Ray Simple Example",
   description="Ray Simple Example",
)
def ray_integration(openshift_server, openshift_token):
   ray_op = components.create_component_from_func(
       ray_fn,
       base_image='registry.redhat.io/ubi8/python-39:latest',
       packages_to_install=["codeflare-sdk"],
   )
   ray_op(openshift_server, openshift_token)


if __name__ == '__main__': 
    from kfp_tekton.compiler import TektonCompiler
    TektonCompiler().compile(ray_integration, 'compiled-example.yaml')

from kfp import components, dsl


def ray_fn(openshift_server: str, openshift_token: str) -> int:


   import ray
   from codeflare_sdk.cluster.auth import TokenAuthentication
   from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration


   auth = TokenAuthentication(


       token=openshift_token, server=openshift_server, skip_tls=True
   )
   auth_return = auth.login()
   cluster = Cluster(


       ClusterConfiguration(
           name="raytest",
           # namespace must exist
           namespace="pipeline-example",
           num_workers=1,
           head_cpus="500m",
           min_memory=1,
           max_memory=1,
           num_gpus=0,
           image="quay.io/project-codeflare/ray:latest-py39-cu118",


           instascale=False,


       )
   )


   print(cluster.status())
   cluster.up()


   cluster.wait_ready()


   print(cluster.status())
   print(cluster.details())


   ray_dashboard_uri = cluster.cluster_dashboard_uri()
   ray_cluster_uri = cluster.cluster_uri()
   print(ray_dashboard_uri, ray_cluster_uri)


   # Before proceeding, ensure that the cluster exists and that its URI contains a value
   assert ray_cluster_uri, "Ray cluster must be started and set before proceeding"


   ray.init(address=ray_cluster_uri)
   print("Ray cluster is up and running: ", ray.is_initialized())


   @ray.remote
   def train_fn():


       # complex training function
       return 100


   result = ray.get(train_fn.remote())
   assert 100 == result
   ray.shutdown()
   cluster.down()


   auth.logout()
   return result


@dsl.pipeline(


   name="Ray Simple Example",
   description="Ray Simple Example",
)
def ray_integration(openshift_server, openshift_token):
   ray_op = components.create_component_from_func(
       ray_fn,
       base_image='registry.redhat.io/ubi8/python-39:latest',
       packages_to_install=["codeflare-sdk"],
   )
   ray_op(openshift_server, openshift_token)


if __name__ == '__main__':


    from kfp_tekton.compiler import TektonCompiler
    TektonCompiler().compile(ray_integration, 'compiled-example.yaml')

Copy to Clipboard

Toggle word wrap

1: 从 CodeFlare SDK 导入定义集群功能的软件包
2: 使用您在创建管道运行时指定的值通过集群进行身份验证
3: 指定 Ray 集群资源：将这些示例值替换为您的 Ray 集群的值
4: 指定 Ray 集群镜像的位置：如果使用断开连接的环境，请将默认值替换为您的环境的位置
5: 此发行版本不支持 InstaScale
6: 使用指定的镜像和配置创建 Ray 集群
7: 等待 Ray 集群就绪，然后继续
8: 将本节中的示例详情替换为您的工作负载详情
9: 工作负载完成后删除 Ray 集群
10: 将示例名称和描述替换为您的工作负载的值
11: 编译 Python 代码，并将输出保存在 YAML 文件中

编译 Python 文件（本例中为 compile_example.py 文件）：
```
python compile_example.py
```
```
$ python compile_example.py
```
Copy to Clipboard Toggle word wrap
这个命令会创建一个 YAML 文件（在这个示例中是 compiled-example.yaml），您可以在下一步中导入该文件。

导入您的数据科学管道，如导入数据科学管道中所述。
调度管道运行，如调度管道运行中所述。
当管道运行完成后，确认它包含在触发的管道运行列表中，如查看触发的管道运行中所述。

验证

创建 YAML 文件，管道运行完成且无错误。您可以查看运行的详情，如查看管道运行的详情中所述。

返回顶部

6.4. 从数据科学项目运行分布式数据科学工作负载

学习

尝试、购买和销售

社区

关于红帽文档

让开源更具包容性

關於紅帽

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links