Este contenido no está disponible en el idioma seleccionado.

12.2. Installing the Hadoop FileSystem Plugin for Red Hat Storage

12.2.1. Adding the Hadoop Installer for Red Hat Storage
Copiar enlace

You must have the big-data channel added and the hadoop components installed on all the servers to use the Hadoop feature on Red Hat Storage. Run the following command on the Ambari Management Server, the YARN Master Server and all the servers within the Red Hat Storage trusted storage pool:

yum install rhs-hadoop rhs-hadoop-install

# yum install rhs-hadoop rhs-hadoop-install

Copy to Clipboard

Toggle word wrap

On the YARN Master Server

The YARN Master Server is required to FUSE Mount all Red Hat Storage Volumes that is used with Hadoop. It must have the Red Hat Storage Client Channel enabled so that the setup_cluster script can install the Red Hat Storage Client Libraries on it.

If you have registered your machine using Red Hat Subscription Manager, enable the channel by running the following command:
```
subscription-manager repos --enable=rhel-6-server-rhs-client-1-rpms
```
```
# subscription-manager repos --enable=rhel-6-server-rhs-client-1-rpms
```
Copy to Clipboard Toggle word wrap
If you have registered your machine using Satellite server, enable the channel by running the following command:
```
rhn-channel --add --channel=rhel-x86_64-server-rhsclient-6
```
```
# rhn-channel --add --channel=rhel-x86_64-server-rhsclient-6
```
Copy to Clipboard Toggle word wrap

12.2.2. Configuring the Trusted Storage Pool for use with Hadoop
Copiar enlace

Red Hat Storage provides a series of utility scripts that allows you to quickly prepare Red Hat Storage for use with Hadoop, and install the Ambari Management Server. You must first run the Hadoop cluster configuration initial script to install the Ambari Management Server, prepare the YARN Master Server to host the Resource Manager and Job History Server services for Red Hat Storage and build a trusted storage pool if it does not exist.

Note

You must run the script given below irrespective of whether you have an existing Red Hat Storage trusted storage pool or not.

To run the Hadoop configuration initial script:

Open the terminal window of the server designated to be the Ambari Management Server and navigate to the /usr/share/rhs-hadoop-install/ directory.
Run the hadoop cluster configuration script as given below:
```
setup_cluster.sh [-y] [--hadoop-mgmt-node <node>] [--yarn-master <node>]  <node-list-spec>
```
```
setup_cluster.sh [-y] [--hadoop-mgmt-node <node>] [--yarn-master <node>]  <node-list-spec>
```
Copy to Clipboard Toggle word wrap
where <node-list-spec> is
```
<node1>:<brickmnt1>:<blkdev1>  <node2>[:<brickmnt2>][:<blkdev2>]  [<node3>[:<brickmnt3>][:<blkdev3>]] ... [<nodeN>[:<brickmntN>][:<blkdevN>]]
```
```
<node1>:<brickmnt1>:<blkdev1>  <node2>[:<brickmnt2>][:<blkdev2>]  [<node3>[:<brickmnt3>][:<blkdev3>]] ... [<nodeN>[:<brickmntN>][:<blkdevN>]]
```
Copy to Clipboard Toggle word wrap
where
- <brickmnt> is the name of the XFS mount for the above <blkdev>,for example, /mnt/brick1 or /external/HadoopBrick. When a Red Hat Storage volume is created its bricks has the volume name appended, so <brickmnt> is a prefix for the volume's bricks. Example: If a new volume is named HadoopVol then its brick list would be: <node>:/mnt/brick1/HadoopVol or <node>:/external/HadoopBrick/HadoopVol.
- <blkdev> is the name of a Logical Volume device path, for example, /dev/VG1/LV1 or /dev/mapper/VG1-LV1. Since LVM is a prerequisite for Red Hat Storage, the <blkdev> is not expected to be a raw block path, such as /dev/sdb.
Given below is an example of running setup_cluster.sh script on a the YARN Master server and four Red Hat Storage Nodes which has the same logical volume and mount point intended to be used as a Red Hat Storage Brick.
```
 ./setup_cluster.sh --yarn-master yarn.hdp rhs-1.hdp:/mnt/brick1:/dev/rhs_vg1/rhs_lv1 rhs-2.hdp rhs-3.hdp rhs-4.hdp
```
```
 ./setup_cluster.sh --yarn-master yarn.hdp rhs-1.hdp:/mnt/brick1:/dev/rhs_vg1/rhs_lv1 rhs-2.hdp rhs-3.hdp rhs-4.hdp
```
Copy to Clipboard Toggle word wrap
Note
If a brick mount is omitted, the brick mount of the first node is used and if one block device is omitted, the block device of the first node is used.

12.2.3. Creating Volumes for use with Hadoop
Copiar enlace

Note

To use an existing Red Hat Storage Volume with Hadoop, skip this section and continue with the section Adding the User Directories for the Hadoop Processes on the Red Hat Storage Volume.

Whether you have a new or existing Red Hat Storage trusted storage pool, to create a volume for use with Hadoop, the volume need to be created in such a way as to support Hadoop workloads. The supported volume configuration for Hadoop is Distributed Replicated volume with replica count 2. You must not name the Hadoop enabled Red Hat Storage volume as hadoop or mapredlocal.

Run the script given below to create new volumes that you intend to use with Hadoop. The script provides the necessary configuration parameters to the volume as well as updates the Hadoop Configuration to make the volume accessible to Hadoop.

Open the terminal window of the server designated to be the Ambari Management Server and navigate to the /usr/share/rhs-hadoop-install/ directory.
Run the hadoop cluster configuration script as given below:
```
create_vol.sh [-y] <volName> <volMountPrefix> <node-list>
```
```
create_vol.sh [-y] <volName> <volMountPrefix> <node-list>
```
Copy to Clipboard Toggle word wrap
where
- <node-list> is: <node1>:<brickmnt> <node2>[:<brickmnt2>] <node3>[:<brickmnt3>] ... [<nodeN>[:<brickmntN>
- <brickmnt> is the name of the XFS mount for the block devices used by the above nodes, for example, /mnt/brick1 or /external/HadoopBrick. When a RHS volume is created its bricks will have the volume name appended, so <brickmnt> is a prefix for the volume's bricks. For example, if a new volume is named HadoopVol then its brick list would be: <node>:/mnt/brick1/HadoopVol or <node>:/external/HadoopBrick/HadoopVol.
Note
The node-list for create_vol.sh is similar to the node-list-spec used by setup_cluster.sh except that a block device is not specified in create_vol.
Given below is an example on how to create a volume named HadoopVol, using 4 Red Hat Storage Servers, each with the same brick mount and mount the volume on /mnt/glusterfs
```
./create_vol.sh HadoopVol /mnt/glusterfs rhs-1.hdp:/mnt/brick1 rhs-2.hdp rhs-3.hdp rhs-4.hdp
```
```
./create_vol.sh HadoopVol /mnt/glusterfs rhs-1.hdp:/mnt/brick1 rhs-2.hdp rhs-3.hdp rhs-4.hdp
```
Copy to Clipboard Toggle word wrap

12.2.4. Adding the User Directories for the Hadoop Processes on the Red Hat Storage Volume
Copiar enlace

After creating the volume, you need to setup the user directories for all the Hadoop ecosystem component users that you created in the prerequisites section. This is required for completing the Ambari distribution successfully.

Note

Perform the steps given below only when the volume is created and enabled to be used with Hadoop.

Open the terminal window of the Red Hat Storage server within the trusted storage pool and run the following commands:

mkdir /mnt/glusterfs/HadoopVol/user/mapred
mkdir /mnt/glusterfs/HadoopVol/user/yarn
mkdir /mnt/glusterfs/HadoopVol/user/hcat
mkdir /mnt/glusterfs/HadoopVol/user/hive
mkdir /mnt/glusterfs/HadoopVol/user/ambari-qa

# mkdir /mnt/glusterfs/HadoopVol/user/mapred
# mkdir /mnt/glusterfs/HadoopVol/user/yarn
# mkdir /mnt/glusterfs/HadoopVol/user/hcat
# mkdir /mnt/glusterfs/HadoopVol/user/hive
# mkdir /mnt/glusterfs/HadoopVol/user/ambari-qa

Copy to Clipboard

Toggle word wrap

chown ambari-qa:hadoop /mnt/glusterfs/HadoopVol/user/ambari-qa
chown hive:hadoop /mnt/glusterfs/HadoopVol/user/hive
chown hcat:hadoop /mnt/glusterfs/HadoopVol/user/hcat
chown yarn:hadoop /mnt/glusterfs/HadoopVol/user/yarn
chown mapred:hadoop /mnt/glusterfs/HadoopVol/user/mapred

# chown ambari-qa:hadoop /mnt/glusterfs/HadoopVol/user/ambari-qa
# chown hive:hadoop /mnt/glusterfs/HadoopVol/user/hive
# chown hcat:hadoop /mnt/glusterfs/HadoopVol/user/hcat
# chown yarn:hadoop /mnt/glusterfs/HadoopVol/user/yarn
# chown mapred:hadoop /mnt/glusterfs/HadoopVol/user/mapred

Copy to Clipboard

Toggle word wrap

12.2.5. Deploying and Configuring the HDP 2.0.6 Stack on Red Hat Storage using Ambari Manager
Copiar enlace

Perform the following steps to deploy and configure HDP stack on Red Hat Storage:

Important

This section describes how to deploy HDP on Red Hat Storage. Selecting HDFS as the storage selection in the HDP 2.0.6.GlusterFS stack is not supported. If you want to deploy HDFS, then you must select the HDP 2.0.6 stack (not HDP 2.0.6.GlusterFS) and follow the instructions of the Hortonworks documentation.

Launch a web browser and enter http://hostname:8080 in the URL by replacing hostname with the hostname of your Ambari Management Server.
Note
If the Ambari Console fails to load in the browser, it is usually because iptables is still running. Stop iptables by opening a terminal window and run service iptables stop command.
Enter admin and admin for the username and password.
Assign a name to your cluster, such as MyCluster.
Select the HDP 2.0.6.GlusterFS Stack (if not already selected by default) and click Next.
On the Install Options screen:
1. For Target Hosts, add the YARN server and all the nodes in the trusted storage pool.
2. Select Perform manual registrations on hosts and do not use SSH option.
3. Accept any warnings you may see and click Register and Confirm button.
4. Click OK on Before you proceed warning warning. The Ambari Agents have all been installed for you during the setup_cluster.sh script.
For Confirm Hosts, the progress must show as green for all the hosts. Click Next and ignore the Host Check warning.
For Choose Services, unselect HDFS and as a minimum select GlusterFS, Ganglia, YARN+MapReduce2 and ZooKeeper.
Note
- Do not select the Nagios service, as it is not supported. For more information, see subsection 21.1. Deployment Scenarios of chapter 21. Administering the Hortonworks Data Platform on Red Hat Storage in the Red Hat Storage 3.0 Administration Guide.
- The use of HBase has not been extensively tested and is not yet supported.
- This section describes how to deploy HDP on Red Hat Storage. Selecting HDFS as the storage selection in the HDP 2.0.6.GlusterFS stack is not supported. If users wish to deploy HDFS, then they must select the HDP 2.0.6 stack (not HDP 2.0.6.GlusterFS) and follow the instructions in the Hortonworks documentation.
For Assign Masters, set all the services to your designated YARN Master Server. For ZooKeeper, select at least 3 separate nodes within your cluster.
For Assign Slaves and Clients, select all the nodes as NodeManagers except the YARN Master Server. You must also ensure to click the Client checkbox for each node.

On the Customize Services screen:

Click YARN tab, scroll down to the yarn.nodemanager.log-dirs and yarn.nodemanager.local-dirs properties and remove any entries that begin with /mnt/glusterfs/.
Click MapReduce2 tab, scroll down to the Advanced section, and modify the following property:
Expand
Key Value
yarn.app.mapreduce.am.staging-dir glusterfs:///user

Key	Value
yarn.app.mapreduce.am.staging-dir	glusterfs:///user

Click MapReduce2 tab, scroll down to the bottom, and under the custom mapred-site.xml, add the following four custom properties and then click on the Next button:

Expand

Key	Value
mapred.healthChecker.script.path	glusterfs:///mapred/jobstatus
mapred.job.tracker.history.completed.location	glusterfs:///mapred/history/done
mapred.system.dir	glusterfs:///mapred/system
mapreduce.jobtracker.staging.root.dir	glusterfs:///user

Review other tabs that are highlighted in red. These require you to enter additional information, such as passwords for the respective services.

Review your configuration and then click Deploy button. Once the deployment is complete, it will state that the deployment is 100% complete and the progress bars will be colored in Orange.
Note
The deployment process is susceptible to network and bandwidth issues. If the deployment fails, try clicking "Retry" to attempt the deployment again. This often resolves the issue.
Click Next to proceed to the Ambari Dashboard. Select the YARN service on the top left and click Stop-All. Do not click Start-All until you perform the steps in section Section 12.2.7, “Configuring the Linux Container Executor”.

12.2.6. Enabling Existing Volumes for use with Hadoop
Copiar enlace

Important

This section is mandatory for every volume you intend to use with Hadoop. It is not sufficient to run the create_vol.sh script, you must follow the steps listed in this section as well.

If you have an existing Red Hat Storage trusted storage pool with volumes that contain data that you would like to analyze with Hadoop, the volumes need to be configured to support Hadoop workloads. Execute the script given below on every volume that you intend to use with Hadoop. The script provides the necessary configuration parameters for the volume and updates the Hadoop Configuration to make the volume accessible to Hadoop.

Note

The supported volume configuration for Hadoop is Distributed Replicated volume with replica count 2.

Open the terminal window of the server designated to be the Ambari Management Server and navigate to the /usr/share/rhs-hadoop-install/ directory.
Run the Hadoop Trusted Storage pool configuration script as given below:
```
enable_vol.sh [-y]  [--hadoop-mgmt-node <node>] [--user <admin-user>] [--pass <admin-password>] [--port <mgmt-port-num>] [--yarn-master <node>] [--rhs-node <storage-node>] <volName>
```
```
# enable_vol.sh [-y]  [--hadoop-mgmt-node <node>] [--user <admin-user>] [--pass <admin-password>] [--port <mgmt-port-num>] [--yarn-master <node>] [--rhs-node <storage-node>] <volName>
```
Copy to Clipboard Toggle word wrap
For Example;
```
./enable_vol.sh --yarn-master yarn.hdp  --rhs-node rhs-1.hdp HadoopVol
```
```
./enable_vol.sh --yarn-master yarn.hdp  --rhs-node rhs-1.hdp HadoopVol
```
Copy to Clipboard Toggle word wrap
Note
If --yarn-master and/or --rhs-node options are omitted then the default of localhost (the node from which the script is being executed) is assumed. --rhs-node is the hostname of any of the storage nodes in the trusted storage pool. This is required to access the gluster command. Default is localhost and it must have gluster CLI access.

12.2.7. Configuring the Linux Container Executor
Copiar enlace

The Container Executor program used by the YARN framework defines how any container is launched and controlled. The Linux Container Executor sets up restricted permissions and the user/group ownership of local files and directories used by the containers such as the shared objects, jars, intermediate files, log files, and so on. Perform the following steps to configure the Linux Container Executor program:

In the Ambari console, click Stop All in the Services navigation panel. You must wait until all the services are completely stopped.
On each server within the Red Hat Storage trusted storage pool:
1. Open the terminal and navigate to /usr/share/rhs-hadoop-install/ directory:
2. Execute the setup_container_executor.sh script.
On each server within the Red Hat Storage trusted storage pool and the YARN Master server:
1. Open the terminal and navigate to /etc/hadoop/conf/ directory.
2. Replace the contents of container-executor.cfg file with the following:
  yarn.nodemanager.linux-container-executor.group=hadoop banned.users=yarn min.user.id=1000 allowed.system.users=tom
  Copy to Clipboard Toggle word wrap
  Note
  Ensure that there is no additional whitespace at the end of each line and at the end of the file. Also, tom is an example user. Hadoop ignores the allowed.system.user parameter, but we recommend having at least one valid user. You can modify this file on one server and then use Secure Copy (or any another approach) to copy the modified file to the same location on each server.

Este contenido no está disponible en el idioma seleccionado.

12.2. Installing the Hadoop FileSystem Plugin for Red Hat Storage

12.2.1. Adding the Hadoop Installer for Red Hat Storage
Copiar enlace

12.2.2. Configuring the Trusted Storage Pool for use with Hadoop
Copiar enlace

12.2.3. Creating Volumes for use with Hadoop
Copiar enlace

12.2.4. Adding the User Directories for the Hadoop Processes on the Red Hat Storage Volume
Copiar enlace

12.2.5. Deploying and Configuring the HDP 2.0.6 Stack on Red Hat Storage using Ambari Manager
Copiar enlace

12.2.6. Enabling Existing Volumes for use with Hadoop
Copiar enlace

12.2.7. Configuring the Linux Container Executor
Copiar enlace

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Este contenido no está disponible en el idioma seleccionado.

12.2. Installing the Hadoop FileSystem Plugin for Red Hat Storage

12.2.1. Adding the Hadoop Installer for Red Hat StorageCopiar enlaceEnlace copiado en el portapapeles!

12.2.2. Configuring the Trusted Storage Pool for use with HadoopCopiar enlaceEnlace copiado en el portapapeles!

12.2.3. Creating Volumes for use with HadoopCopiar enlaceEnlace copiado en el portapapeles!

12.2.4. Adding the User Directories for the Hadoop Processes on the Red Hat Storage VolumeCopiar enlaceEnlace copiado en el portapapeles!

12.2.5. Deploying and Configuring the HDP 2.0.6 Stack on Red Hat Storage using Ambari ManagerCopiar enlaceEnlace copiado en el portapapeles!

12.2.6. Enabling Existing Volumes for use with HadoopCopiar enlaceEnlace copiado en el portapapeles!

12.2.7. Configuring the Linux Container ExecutorCopiar enlaceEnlace copiado en el portapapeles!

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

12.2.1. Adding the Hadoop Installer for Red Hat Storage
Copiar enlace

12.2.2. Configuring the Trusted Storage Pool for use with Hadoop
Copiar enlace

12.2.3. Creating Volumes for use with Hadoop
Copiar enlace

12.2.4. Adding the User Directories for the Hadoop Processes on the Red Hat Storage Volume
Copiar enlace

12.2.5. Deploying and Configuring the HDP 2.0.6 Stack on Red Hat Storage using Ambari Manager
Copiar enlace

12.2.6. Enabling Existing Volumes for use with Hadoop
Copiar enlace

12.2.7. Configuring the Linux Container Executor
Copiar enlace