Este contenido no está disponible en el idioma seleccionado.

12.2. Installing the Hadoop FileSystem Plugin for Red Hat Storage


12.2.1. Adding the Hadoop Installer for Red Hat Storage

You must have the big-data channel added and the hadoop components installed on all the servers to use the Hadoop feature on Red Hat Storage. Run the following command on the Ambari Management Server, the YARN Master Server and all the servers within the Red Hat Storage trusted storage pool:
# yum install rhs-hadoop rhs-hadoop-install
Copy to Clipboard Toggle word wrap
On the YARN Master Server

The YARN Master Server is required to FUSE Mount all Red Hat Storage Volumes that is used with Hadoop. It must have the Red Hat Storage Client Channel enabled so that the setup_cluster script can install the Red Hat Storage Client Libraries on it.

  • If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
    # subscription-manager repos --enable=rhel-6-server-rhs-client-1-rpms
    Copy to Clipboard Toggle word wrap
  • If you have registered your machine using Satellite server, enable the channel by running the following command:
    # rhn-channel --add --channel=rhel-x86_64-server-rhsclient-6
    Copy to Clipboard Toggle word wrap

12.2.2. Configuring the Trusted Storage Pool for use with Hadoop

Red Hat Storage provides a series of utility scripts that allows you to quickly prepare Red Hat Storage for use with Hadoop, and install the Ambari Management Server. You must first run the Hadoop cluster configuration initial script to install the Ambari Management Server, prepare the YARN Master Server to host the Resource Manager and Job History Server services for Red Hat Storage and build a trusted storage pool if it does not exist.

Note

You must run the script given below irrespective of whether you have an existing Red Hat Storage trusted storage pool or not.
To run the Hadoop configuration initial script:
  1. Open the terminal window of the server designated to be the Ambari Management Server and navigate to the /usr/share/rhs-hadoop-install/ directory.
  2. Run the hadoop cluster configuration script as given below:
    setup_cluster.sh [-y] [--hadoop-mgmt-node <node>] [--yarn-master <node>]  <node-list-spec>
    Copy to Clipboard Toggle word wrap
    where <node-list-spec> is
    <node1>:<brickmnt1>:<blkdev1>  <node2>[:<brickmnt2>][:<blkdev2>]  [<node3>[:<brickmnt3>][:<blkdev3>]] ... [<nodeN>[:<brickmntN>][:<blkdevN>]]
    Copy to Clipboard Toggle word wrap
    where
    • <brickmnt> is the name of the XFS mount for the above <blkdev>,for example, /mnt/brick1 or /external/HadoopBrick. When a Red Hat Storage volume is created its bricks has the volume name appended, so <brickmnt> is a prefix for the volume's bricks. Example: If a new volume is named HadoopVol then its brick list would be: <node>:/mnt/brick1/HadoopVol or <node>:/external/HadoopBrick/HadoopVol.
    • <blkdev> is the name of a Logical Volume device path, for example, /dev/VG1/LV1 or /dev/mapper/VG1-LV1. Since LVM is a prerequisite for Red Hat Storage, the <blkdev> is not expected to be a raw block path, such as /dev/sdb.
    Given below is an example of running setup_cluster.sh script on a the YARN Master server and four Red Hat Storage Nodes which has the same logical volume and mount point intended to be used as a Red Hat Storage Brick.
     ./setup_cluster.sh --yarn-master yarn.hdp rhs-1.hdp:/mnt/brick1:/dev/rhs_vg1/rhs_lv1 rhs-2.hdp rhs-3.hdp rhs-4.hdp
    Copy to Clipboard Toggle word wrap

    Note

    If a brick mount is omitted, the brick mount of the first node is used and if one block device is omitted, the block device of the first node is used.

12.2.3. Creating Volumes for use with Hadoop

Note

To use an existing Red Hat Storage Volume with Hadoop, skip this section and continue with the section Adding the User Directories for the Hadoop Processes on the Red Hat Storage Volume.
Whether you have a new or existing Red Hat Storage trusted storage pool, to create a volume for use with Hadoop, the volume need to be created in such a way as to support Hadoop workloads. The supported volume configuration for Hadoop is Distributed Replicated volume with replica count 2. You must not name the Hadoop enabled Red Hat Storage volume as hadoop or mapredlocal.
Run the script given below to create new volumes that you intend to use with Hadoop. The script provides the necessary configuration parameters to the volume as well as updates the Hadoop Configuration to make the volume accessible to Hadoop.
  1. Open the terminal window of the server designated to be the Ambari Management Server and navigate to the /usr/share/rhs-hadoop-install/ directory.
  2. Run the hadoop cluster configuration script as given below:
    create_vol.sh [-y] <volName> <volMountPrefix> <node-list>
    Copy to Clipboard Toggle word wrap
    where
    • <node-list> is: <node1>:<brickmnt> <node2>[:<brickmnt2>] <node3>[:<brickmnt3>] ... [<nodeN>[:<brickmntN>
    • <brickmnt> is the name of the XFS mount for the block devices used by the above nodes, for example, /mnt/brick1 or /external/HadoopBrick. When a RHS volume is created its bricks will have the volume name appended, so <brickmnt> is a prefix for the volume's bricks. For example, if a new volume is named HadoopVol then its brick list would be: <node>:/mnt/brick1/HadoopVol or <node>:/external/HadoopBrick/HadoopVol.

    Note

    The node-list for create_vol.sh is similar to the node-list-spec used by setup_cluster.sh except that a block device is not specified in create_vol.
    Given below is an example on how to create a volume named HadoopVol, using 4 Red Hat Storage Servers, each with the same brick mount and mount the volume on /mnt/glusterfs
    ./create_vol.sh HadoopVol /mnt/glusterfs rhs-1.hdp:/mnt/brick1 rhs-2.hdp rhs-3.hdp rhs-4.hdp
    Copy to Clipboard Toggle word wrap
After creating the volume, you need to setup the user directories for all the Hadoop ecosystem component users that you created in the prerequisites section. This is required for completing the Ambari distribution successfully.

Note

Perform the steps given below only when the volume is created and enabled to be used with Hadoop.
Open the terminal window of the Red Hat Storage server within the trusted storage pool and run the following commands:
# mkdir /mnt/glusterfs/HadoopVol/user/mapred
# mkdir /mnt/glusterfs/HadoopVol/user/yarn
# mkdir /mnt/glusterfs/HadoopVol/user/hcat
# mkdir /mnt/glusterfs/HadoopVol/user/hive
# mkdir /mnt/glusterfs/HadoopVol/user/ambari-qa
Copy to Clipboard Toggle word wrap
# chown ambari-qa:hadoop /mnt/glusterfs/HadoopVol/user/ambari-qa
# chown hive:hadoop /mnt/glusterfs/HadoopVol/user/hive
# chown hcat:hadoop /mnt/glusterfs/HadoopVol/user/hcat
# chown yarn:hadoop /mnt/glusterfs/HadoopVol/user/yarn
# chown mapred:hadoop /mnt/glusterfs/HadoopVol/user/mapred
Copy to Clipboard Toggle word wrap
Perform the following steps to deploy and configure HDP stack on Red Hat Storage:

Important

This section describes how to deploy HDP on Red Hat Storage. Selecting HDFS as the storage selection in the HDP 2.0.6.GlusterFS stack is not supported. If you want to deploy HDFS, then you must select the HDP 2.0.6 stack (not HDP 2.0.6.GlusterFS) and follow the instructions of the Hortonworks documentation.
  1. Launch a web browser and enter http://hostname:8080 in the URL by replacing hostname with the hostname of your Ambari Management Server.

    Note

    If the Ambari Console fails to load in the browser, it is usually because iptables is still running. Stop iptables by opening a terminal window and run service iptables stop command.
  2. Enter admin and admin for the username and password.
  3. Assign a name to your cluster, such as MyCluster.
  4. Select the HDP 2.0.6.GlusterFS Stack (if not already selected by default) and click Next.
  5. On the Install Options screen:
    1. For Target Hosts, add the YARN server and all the nodes in the trusted storage pool.
    2. Select Perform manual registrations on hosts and do not use SSH option.
    3. Accept any warnings you may see and click Register and Confirm button.
    4. Click OK on Before you proceed warning warning. The Ambari Agents have all been installed for you during the setup_cluster.sh script.
  6. For Confirm Hosts, the progress must show as green for all the hosts. Click Next and ignore the Host Check warning.
  7. For Choose Services, unselect HDFS and as a minimum select GlusterFS, Ganglia, YARN+MapReduce2 and ZooKeeper.

    Note

    • Do not select the Nagios service, as it is not supported. For more information, see subsection 21.1. Deployment Scenarios of chapter 21. Administering the Hortonworks Data Platform on Red Hat Storage in the Red Hat Storage 3.0 Administration Guide.
    • The use of HBase has not been extensively tested and is not yet supported.
    • This section describes how to deploy HDP on Red Hat Storage. Selecting HDFS as the storage selection in the HDP 2.0.6.GlusterFS stack is not supported. If users wish to deploy HDFS, then they must select the HDP 2.0.6 stack (not HDP 2.0.6.GlusterFS) and follow the instructions in the Hortonworks documentation.
  8. For Assign Masters, set all the services to your designated YARN Master Server. For ZooKeeper, select at least 3 separate nodes within your cluster.
  9. For Assign Slaves and Clients, select all the nodes as NodeManagers except the YARN Master Server. You must also ensure to click the Client checkbox for each node.
  10. On the Customize Services screen:
    1. Click YARN tab, scroll down to the yarn.nodemanager.log-dirs and yarn.nodemanager.local-dirs properties and remove any entries that begin with /mnt/glusterfs/.
    2. Click MapReduce2 tab, scroll down to the Advanced section, and modify the following property:
      Expand
      Key Value
      yarn.app.mapreduce.am.staging-dir glusterfs:///user
    3. Click MapReduce2 tab, scroll down to the bottom, and under the custom mapred-site.xml, add the following four custom properties and then click on the Next button:
      Expand
      Key Value
      mapred.healthChecker.script.path glusterfs:///mapred/jobstatus
      mapred.job.tracker.history.completed.location glusterfs:///mapred/history/done
      mapred.system.dir glusterfs:///mapred/system
      mapreduce.jobtracker.staging.root.dir glusterfs:///user
    4. Review other tabs that are highlighted in red. These require you to enter additional information, such as passwords for the respective services.
  11. Review your configuration and then click Deploy button. Once the deployment is complete, it will state that the deployment is 100% complete and the progress bars will be colored in Orange.

    Note

    The deployment process is susceptible to network and bandwidth issues. If the deployment fails, try clicking "Retry" to attempt the deployment again. This often resolves the issue.
  12. Click Next to proceed to the Ambari Dashboard. Select the YARN service on the top left and click Stop-All. Do not click Start-All until you perform the steps in section Section 12.2.7, “Configuring the Linux Container Executor”.

12.2.6. Enabling Existing Volumes for use with Hadoop

Important

This section is mandatory for every volume you intend to use with Hadoop. It is not sufficient to run the create_vol.sh script, you must follow the steps listed in this section as well.
If you have an existing Red Hat Storage trusted storage pool with volumes that contain data that you would like to analyze with Hadoop, the volumes need to be configured to support Hadoop workloads. Execute the script given below on every volume that you intend to use with Hadoop. The script provides the necessary configuration parameters for the volume and updates the Hadoop Configuration to make the volume accessible to Hadoop.

Note

The supported volume configuration for Hadoop is Distributed Replicated volume with replica count 2.
  1. Open the terminal window of the server designated to be the Ambari Management Server and navigate to the /usr/share/rhs-hadoop-install/ directory.
  2. Run the Hadoop Trusted Storage pool configuration script as given below:
    # enable_vol.sh [-y]  [--hadoop-mgmt-node <node>] [--user <admin-user>] [--pass <admin-password>] [--port <mgmt-port-num>] [--yarn-master <node>] [--rhs-node <storage-node>] <volName>
    Copy to Clipboard Toggle word wrap
    For Example;
    ./enable_vol.sh --yarn-master yarn.hdp  --rhs-node rhs-1.hdp HadoopVol
    Copy to Clipboard Toggle word wrap

    Note

    If --yarn-master and/or --rhs-node options are omitted then the default of localhost (the node from which the script is being executed) is assumed. --rhs-node is the hostname of any of the storage nodes in the trusted storage pool. This is required to access the gluster command. Default is localhost and it must have gluster CLI access.

12.2.7. Configuring the Linux Container Executor

The Container Executor program used by the YARN framework defines how any container is launched and controlled. The Linux Container Executor sets up restricted permissions and the user/group ownership of local files and directories used by the containers such as the shared objects, jars, intermediate files, log files, and so on. Perform the following steps to configure the Linux Container Executor program:
  1. In the Ambari console, click Stop All in the Services navigation panel. You must wait until all the services are completely stopped.
  2. On each server within the Red Hat Storage trusted storage pool:
    1. Open the terminal and navigate to /usr/share/rhs-hadoop-install/ directory:
    2. Execute the setup_container_executor.sh script.
  3. On each server within the Red Hat Storage trusted storage pool and the YARN Master server:
    1. Open the terminal and navigate to /etc/hadoop/conf/ directory.
    2. Replace the contents of container-executor.cfg file with the following:
      yarn.nodemanager.linux-container-executor.group=hadoop
      banned.users=yarn
      min.user.id=1000
      allowed.system.users=tom
      Copy to Clipboard Toggle word wrap

      Note

      Ensure that there is no additional whitespace at the end of each line and at the end of the file. Also, tom is an example user. Hadoop ignores the allowed.system.user parameter, but we recommend having at least one valid user. You can modify this file on one server and then use Secure Copy (or any another approach) to copy the modified file to the same location on each server.
Volver arriba
Red Hat logoGithubredditYoutubeTwitter

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Ayudamos a los usuarios de Red Hat a innovar y alcanzar sus objetivos con nuestros productos y servicios con contenido en el que pueden confiar. Explore nuestras recientes actualizaciones.

Hacer que el código abierto sea más inclusivo

Red Hat se compromete a reemplazar el lenguaje problemático en nuestro código, documentación y propiedades web. Para más detalles, consulte el Blog de Red Hat.

Acerca de Red Hat

Ofrecemos soluciones reforzadas que facilitan a las empresas trabajar en plataformas y entornos, desde el centro de datos central hasta el perímetro de la red.

Theme

© 2025 Red Hat