이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 3. Workflow


OpenStack Data Processing provisions and scales Hadoop clusters using pre-configured cluster templates that define specifically designed instances. These instances form the individual nodes that make up Hadoop clusters; you can then use these Hadoop clusters to run the jobs/binaries that will process your data.

If you intend to use OpenStack Data Processing, you should already be familiar with the necessary components for working within the Hadoop framework. As such, the general workflow described in this section assumes that you already have the following components prepared:

  • A Hadoop image; specifically, a Red Hat Enterprise Linux image containing a Hadoop data processing plug-in. See Chapter 1, Overview for a list of supported plug-ins.
  • The input data you wish to process, preferably uploaded to the Object Storage service.
  • The job binaries and libraries you will use to process the input data, preferably uploaded to the Object Storage service.
Note

For details on how to upload content to the Object Storage service, see Upload an Object.

In addition, you should also have a general idea of the computational resources required to run the job. This will help you determine what type of nodes (and how many of each) you will need.

The following high-level workflow describes how to configure and use the OpenStack Data Processing service to launch clusters and run jobs on those clusters:

  1. Create an image containing the necessary plug-in components for OpenStack Data Processing (Chapter 4, Create Hadoop Image). This will be your Hadoop image.

    The procedure for creating this image differs depending on your chosen Hadoop plug-in.

  2. Register the following required components to the OpenStack Data Processing service:

    • Hadoop image
    • Data sources (namely, your input data and where the output data should go)
  3. Create node group templates. Each template defines many useful Hadoop-specific settings for any given node, most notably:

    • What Hadoop plug-in and version should the node group use?
    • Which processes should run on the node?
  4. Create or upload cluster templates. A cluster template defines, among other things:

    • Node group composition: namely, how many nodes of each node group should make up the cluster.
    • Cluster-scoped Hadoop configurations: specific parameters you need to set for each Hadoop component (HIVE, AMBARI, HDFS, and the like).
  5. Launch a Hadoop cluster (using a cluster template), and run a job on the cluster (namely, running a registered job binary on a data source). You can also scale the cluster (as in, add or remove nodes of any type) as needed.
  6. Register job binaries, scripts, or libraries to the OpenStack Data Processing Service, create jobs, and launch them on Hadoop clusters. Jobs define which job binaries, scripts, or libraries should be used to process registered data sources.

The next few sections describe each workflow step in greater detail.

맨 위로 이동
Red Hat logoGithubredditYoutubeTwitter

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

Red Hat을 사용하는 고객은 신뢰할 수 있는 콘텐츠가 포함된 제품과 서비스를 통해 혁신하고 목표를 달성할 수 있습니다. 최신 업데이트를 확인하세요.

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat은 코드, 문서, 웹 속성에서 문제가 있는 언어를 교체하기 위해 최선을 다하고 있습니다. 자세한 내용은 다음을 참조하세요.Red Hat 블로그.

Red Hat 소개

Red Hat은 기업이 핵심 데이터 센터에서 네트워크 에지에 이르기까지 플랫폼과 환경 전반에서 더 쉽게 작업할 수 있도록 강화된 솔루션을 제공합니다.

Theme

© 2025 Red Hat