Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Deploying and managing RHEL on Amazon Web Services
Obtaining Red Hat Enterprise Linux system images and creating RHEL instances on AWS
Abstract
- Registering, deploying, and provisioning RHEL images on AWS
- Managing networking configurations for RHEL AWS EC2 instances
- Configuring platform security and trusted execution technologies
- Managing Red Hat High Availability (HA) clusters for RHEL instances
Providing feedback on Red Hat documentation Link kopierenLink in die Zwischenablage kopiert!
We appreciate your feedback on our documentation. Let us know how we can improve it.
Submitting feedback through Jira (account required)
- Log in to the Jira website.
- Click Create in the top navigation bar
- Enter a descriptive title in the Summary field.
- Enter your suggestion for improvement in the Description field. Include links to the relevant parts of the documentation.
- Click Create at the bottom of the dialogue.
Chapter 1. Introducing RHEL on public cloud platforms Link kopierenLink in die Zwischenablage kopiert!
Public cloud platforms offer computing resources as a service. Instead of using on-premise hardware, you can run your IT workloads, including Red Hat Enterprise Linux (RHEL) systems, as public cloud instances.
1.1. Benefits of using RHEL in a public cloud Link kopierenLink in die Zwischenablage kopiert!
RHEL as a cloud instance located on a public cloud platform has the following benefits over RHEL on-premise physical systems or virtual machines (VMs):
- Flexible and fine-grained allocation of resources
A cloud instance of RHEL runs as a VM on a cloud platform, which means a cluster of remote servers maintained by the cloud service provider. Therefore, on the software level, allocating hardware resources to the instance is easily customizable, such as a specific type of CPU or storage.
In comparison to a local RHEL system, you are also not limited by the capabilities of physical host. Instead, you can select from a variety of features, based on selections offered by the cloud provider.
- Space and cost efficiency
You do not need to own any on-premise servers to host cloud workloads. This avoids the space, power, and maintenance requirements associated with physical hardware.
Instead, on public cloud platforms, you pay the cloud provider directly for using a cloud instance. The cost is typically based on the hardware allocated to the instance and the time to use it. Therefore, you can optimize your costs based on the requirements.
- Software-controlled configurations
You save the entire configuration of a cloud instance as data on the cloud platform and control it with software. Therefore, you can easily create, remove, clone, or migrate the instance. You also operate a cloud instance remotely in a cloud provider console, and it connects to remote storage by default.
In addition, you can back up the current state of a cloud instance as a snapshot at any time. Afterwards, you can load the snapshot to restore the instance to the saved state.
- Separation from the host and software compatibility
Similarly to a local VM, the RHEL guest operating system on a cloud instance runs on a Kernel-based Virtual Machine (KVM). This kernel is separate from the host operating system and from the client system that you use to connect to the instance.
Therefore, you can install any operating system on the cloud instance. This means that on a RHEL public cloud instance, you can run RHEL-specific applications not usable on your local operating system.
In addition, even if the operating system of the instance becomes unstable or compromised, it does not affect your client system.
1.2. Public cloud use cases for RHEL Link kopierenLink in die Zwischenablage kopiert!
Deploying on a public cloud provides many benefits, but might not be the most efficient solution in every scenario. If you are evaluating whether to migrate your RHEL deployments to the public cloud, consider whether your use case will benefit from the advantages of the public cloud.
- Beneficial use cases
Deploying public cloud instances is very effective for flexibly increasing and decreasing the active computing power of your deployments, also known as scaling up and scaling down. Therefore, you can use RHEL on public cloud in the following scenarios:
- Clusters with high peak workloads and low general performance requirements. Scaling up and down based on your demands can be highly efficient in terms of resource costs.
- Quickly setting up or expanding your clusters. This avoids high upfront costs of setting up local servers.
- What happens in your local environment does not affect cloud instances. Therefore, you can use them for backup and disaster recovery.
- Potentially problematic use cases
- You are running an existing environment that you cannot adjust. Customizing a cloud instance to fit the specific needs of an existing deployment might not be economically efficient in comparison with your current host platform.
- You are operating with a hard limit on your budget. Maintaining your deployment in a local data center typically provides less flexibility but more control over the maximum resource costs than the public cloud does.
For details on how to obtain RHEL for public cloud deployments, see Obtaining RHEL for public cloud deployments.
1.3. Frequent concerns when migrating to a public cloud Link kopierenLink in die Zwischenablage kopiert!
Moving your RHEL workloads from a local environment to a public cloud platform might raise concerns about the changes involved. The following are the most commonly asked questions:
- Will my RHEL work differently as a cloud instance than as a local virtual machine?
In most respects, RHEL instances on a public cloud platform work the same as RHEL virtual machines on a local host, such as an on-premises server. Notable exceptions include:
- Instead of private orchestration interfaces, public cloud instances use provider-specific console interfaces for managing your cloud resources.
- Certain features, such as nested virtualization, might not work correctly. If a specific feature is critical for your deployment, check the feature’s compatibility in advance with your chosen public cloud provider.
- Will my data stay safe in a public cloud as opposed to a local server?
The data in your RHEL cloud instances is in your ownership, and your public cloud provider does not have any access to it. In addition, major cloud providers support data encryption in transit, which improves the security of data when migrating your virtual machines to the public cloud.
In terms of security of RHEL public cloud instances, the following applies:
- Your public cloud provider is responsible for the security of the cloud hypervisor
- Red Hat provides the security features of the RHEL guest operating systems in your instances
- You manage the specific security settings and practices in your cloud infrastructure
- What effect does my geographic region have on the functionality of RHEL public cloud instances?
- You can use RHEL instances on a public cloud platform regardless of your geographical location. Therefore, you can run your instances in the same region as your on-premises server. However, hosting your instances in a physically distant region might cause high latency when operating them. In addition, depending on the public cloud provider, certain regions might offer additional features or be more cost-efficient. Before creating your RHEL instances, review the properties of the hosting regions available for your chosen cloud provider.
1.4. Obtaining RHEL for public cloud deployments Link kopierenLink in die Zwischenablage kopiert!
To deploy a RHEL system in a public cloud environment, you need to:
Select the optimal cloud provider for your use case, based on your requirements and the current offer on the market. The certified cloud service providers (CCSP) for running RHEL instances are:
- Amazon Web Services (AWS)
- Google Cloud Platform (GCP)
- Note
This document specifically addresses the process of deploying RHEL on AWS.
- Create a RHEL cloud instance on your chosen cloud platform. For details, see Methods for creating RHEL cloud instances.
- To keep your RHEL deployment up-to-date, use Red Hat Update Infrastructure (RHUI).
1.5. Methods for creating RHEL cloud instances Link kopierenLink in die Zwischenablage kopiert!
To deploy a RHEL instance on a public cloud platform, you can use one of the following methods:
- Create a RHEL system image and import it to the cloud platform
- To create the system image, you can use the RHEL image builder or you can build the image manually.
- This method uses your existing RHEL subscription, and is also referred to as bring your own subscription (BYOS).
- You pre-pay a yearly subscription, and you can use your Red Hat customer discount.
- Red Hat provides you with customer service.
-
For creating many images effectively, you can use the
cloud-init
tool.
- Purchase a RHEL instance directly from the cloud provider marketplace
- You post-pay an hourly rate for using the service. Therefore, this method is also referred to as pay as you go (PAYG).
- The cloud platform provider provides you with customer service.
For detailed instructions on using various methods to deploy RHEL instances on Amazon Web Services, see the following chapters in this document.
Chapter 2. Preparing and uploading AMI images to AWS Link kopierenLink in die Zwischenablage kopiert!
You can create custom images and can update them, either manually or automatically, to the AWS cloud with RHEL image builder.
2.1. Preparing to manually upload AWS AMI images Link kopierenLink in die Zwischenablage kopiert!
Before uploading an AWS AMI image, you must configure a system for uploading the images.
Prerequisites
- You must have an Access Key ID configured in the AWS IAM account manager.
- You must have a writable S3 bucket prepared.
Procedure
Install Python 3 and the
pip
tool:dnf install python3 python3-pip
# dnf install python3 python3-pip
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Install the AWS command-line tools with
pip
:pip3 install awscli
# pip3 install awscli
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set your profile. The terminal prompts you to provide your credentials, region and output format:
aws configure
$ aws configure AWS Access Key ID [None]: AWS Secret Access Key [None]: Default region name [None]: Default output format [None]:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Define a name for your bucket and create a bucket:
BUCKET=bucketname aws s3 mb s3://$BUCKET
$ BUCKET=bucketname $ aws s3 mb s3://$BUCKET
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
bucketname
with the actual bucket name. It must be a globally unique name. As a result, your bucket is created.To grant permission to access the S3 bucket, create a
vmimport
S3 Role in the AWS Identity and Access Management (IAM), if you have not already done so in the past:Create a
trust-policy.json
file with the trust policy configuration, in the JSON format. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
role-policy.json
file with the role policy configuration, in the JSON format. For example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a role for your Amazon Web Services account, by using the
trust-policy.json
file:aws iam create-role --role-name vmimport --assume-role-policy-document file://trust-policy.json
$ aws iam create-role --role-name vmimport --assume-role-policy-document file://trust-policy.json
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Embed an inline policy document, by using the
role-policy.json
file:aws iam put-role-policy --role-name vmimport --policy-name vmimport --policy-document file://role-policy.json
$ aws iam put-role-policy --role-name vmimport --policy-name vmimport --policy-document file://role-policy.json
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.2. Manually uploading an AMI image to AWS by using the CLI Link kopierenLink in die Zwischenablage kopiert!
You can use RHEL image builder to build ami
images and manually upload them directly to Amazon AWS Cloud service provider, by using the CLI.
Prerequisites
Procedure
Using the text editor, create a configuration file with the following content:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace values in the fields with your credentials for
accessKeyID
,secretAccessKey
,bucket
, andregion
. TheIMAGE_KEY
value is the name of your VM Image to be uploaded to EC2.-
Save the file as
configuration_file.toml
and close the text editor. Start the compose to upload it to AWS:
composer-cli compose start blueprint-name image-type image-key configuration-file.toml
# composer-cli compose start blueprint-name image-type image-key configuration-file.toml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace:
-
blueprint-name
with the name of the blueprint you created. -
image-type
with theami
image type. -
image-key
with the name of your VM Image to be uploaded to EC2. configuration-file.toml
with the name of the configuration file of the cloud provider.NoteYou must have the correct AWS Identity and Access Management (IAM) settings for the bucket you are going to send your customized image to. You have to set up a policy to your bucket before you are able to upload images to it.
-
Check the status of the image build:
composer-cli compose status
# composer-cli compose status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow After the image upload process is complete, you can see the
FINISHED
status.
Verification
-
Confirm that the image upload was successful by accessing EC2 on the menu and select the correct region in the AWS console. The image must have the
available
status, to indicate that it was successfully uploaded. - On the dashboard, select your image and click .
2.3. Creating and automatically uploading images to the AWS Cloud AMI Link kopierenLink in die Zwischenablage kopiert!
You can create a .raw
image by using RHEL image builder, and choose to check the Upload to AWS checkbox to automatically push the output image that you create directly to the Amazon AWS Cloud AMI service provider.
Prerequisites
-
You must have
root
orwheel
group user access to the system. - You have opened the RHEL image builder interface of the RHEL web console in a browser.
- You have created a blueprint. See Creating a blueprint in the web console interface.
- You must have an Access Key ID configured in the AWS IAM account manager.
- You must have a writable S3 bucket prepared.
Procedure
- In the RHEL image builder dashboard, click the blueprint name that you previously created.
- Select the tab Images.
Click Create Image to create your customized image.
The Create Image window opens.
-
From the Type drop-down menu list, select
Amazon Machine Image Disk (.raw)
. - Check the Upload to AWS checkbox to upload your image to the AWS Cloud and click Next.
To authenticate your access to AWS, type your
AWS access key ID
andAWS secret access key
in the corresponding fields. Click Next.NoteYou can view your AWS secret access key only when you create a new Access Key ID. If you do not know your Secret Key, generate a new Access Key ID.
-
Type the name of the image in the
Image name
field, type the Amazon bucket name in theAmazon S3 bucket name
field and type theAWS region
field for the bucket you are going to add your customized image to. Click Next. Review the information and click Finish.
Optionally, click Back to modify any incorrect detail.
NoteYou must have the correct IAM settings for the bucket you are going to send your customized image. This procedure uses the IAM Import and Export, so you have to set up a policy to your bucket before you are able to upload images to it. For more information, see Required Permissions for IAM Users.
-
From the Type drop-down menu list, select
A pop-up on the upper right informs you of the saving progress. It also informs that the image creation has been initiated, the progress of this image creation and the subsequent upload to the AWS Cloud.
After the process is complete, you can see the Image build complete status.
In a browser, access Service→EC2.
-
On the AWS console dashboard menu, choose the correct region. The image must have the
Available
status, to indicate that it is uploaded. - On the AWS dashboard, select your image and click Launch.
-
On the AWS console dashboard menu, choose the correct region. The image must have the
- A new window opens. Choose an instance type according to the resources you need to start your image. Click Review and Launch.
- Review your instance start details. You can edit each section if you need to make any changes. Click Launch.
Before you start the instance, select a public key to access it.
You can either use the key pair you already have or you can create a new key pair.
Follow the next steps to create a new key pair in EC2 and attach it to the new instance.
- From the drop-down menu list, select Create a new key pair.
- Enter the name to the new key pair. It generates a new key pair.
- Click Download Key Pair to save the new key pair on your local system.
Then, you can click Launch Instance to start your instance.
You can check the status of the instance, which displays as Initializing.
- After the instance status is running, the Connect button becomes available.
Click Connect. A window is displayed with instructions on how to connect by using SSH.
- Select A standalone SSH client as the preferred connection method to and open a terminal.
In the location you store your private key, ensure that your key is publicly viewable for SSH to work. To do so, run the command:
chmod 400 <your_instance_name>.pem
$ chmod 400 <your_instance_name>.pem
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Connect to your instance by using its Public DNS:
ssh -i <your-instance_name>.pem ec2-user@<your-instance-IP-address>
$ ssh -i <your-instance_name>.pem ec2-user@<your-instance-IP-address>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Type
yes
to confirm that you want to continue connecting.As a result, you are connected to your instance over SSH.
Verification
- Check if you are able to perform any action while connected to your instance by using SSH.
Chapter 3. Deploying a RHEL image as an EC2 instance on AWS Link kopierenLink in die Zwischenablage kopiert!
To use RHEL image on an Amazon Web Services (AWS), convert the RHEL image to the AWS-compatible format. Then, deploy a VM from the RHEL image to run as an Elastic Cloud Compute (EC2) instance. To create, customize, and deploy a RHEL Amazon Machine Image (AMI), you can use one of the following methods:
- Use the RHEL image builder. For instructions, see Preparing and uploading AMI images to AWS and AWS specific resources list.
- Manually create and configure an AMI. This is a more complicated process but offers more granular customization options. For instructions, see the following sections.
To deploy a RHEL image as an EC2 instance on AWS, make sure to complete following steps:
- You have created a Red Hat account.
- You have signed up and set up an AWS account.
3.1. Available RHEL image types for public cloud Link kopierenLink in die Zwischenablage kopiert!
To deploy your RHEL virtual machine VM on a certified cloud service provider (CCSP), you can use several options. The following table lists the available image types, subscriptions, considerations, and sample scenarios for the image types.
To deploy customized ISO images, you can use RHEL image builder. With RHEL image builder, you can create, upload, and deploy these custom images specific to your chosen CCSP. For details, see Composing a Customized RHEL System Image.
Image types | Subscriptions | Considerations | Sample scenario |
---|---|---|---|
Deploy a Red Hat gold image | Use your existing Red Hat subscriptions | The subscriptions include the Red Hat product cost and support for Cloud Access images, while you pay the CCSP for all other instance costs | Select a Red Hat gold image on the CCSP. For details on gold images and how to access them on the CCSP, see the Red Hat Cloud Access Reference Guide |
Deploy a custom image that you move to the CCSP | Use your existing Red Hat subscriptions | The subscriptions includes the Red Hat product cost and support for custom RHEL image, while you pay the CCSP for all other instance costs | Upload your custom image and attach your subscriptions |
Deploy an existing RHEL based custom machine image | The custom machine images include a RHEL image | You pay the CCSP on an hourly basis based on a pay-as-you-go model. For this model, on-demand images are available on the CCSP marketplace. The CCSP provides support for these images, while Red Hat handles updates. The CCSP provides updates through the Red Hat Update Infrastructure (RHUI) | Select a RHEL image when you launch an instance on the CCSP cloud management console, or choose an image from the CCSP marketplace. |
To convert an on-demand, license-included EC2 instance to a bring-your-own-license (BYOL) EC2 instance of RHEL, see Convert a license type for Linux in License Manager.
3.2. Deploying a RHEL instance by using a custom base image Link kopierenLink in die Zwischenablage kopiert!
To manually configure a virtual machine (VM), first create a base (starter) image. Then, you can modify configuration settings and add the packages the VM requires to operate on the cloud. You can also make additional configuration changes for your specific application after you upload the image.
Creating a VM from a base image has the following advantages:
- Fully customizable
- High flexibility for any use case
- Lightweight - includes only the operating system and the required runtime libraries
To create a custom base image of RHEL from an ISO image, you can use the command line interface (CLI) or the web console for creating and configuring VM.
Verify the following VM configurations.
- SSH - Enable SSH to give remote access to your VM.
- DHCP - Configure the primary virtual adapter to use DHCP.
Prerequisites
- You have enabled virtualization on the host machine.
- For web console, ensure the following options:
- You have not checked the Immediately Start VM option.
- You have already changed the Memory size to your preferred settings.
- You have changed the Model option under Virtual Network Interface Settings to virtio and vCPUs to the capacity settings for the VM.
Procedure
Configure the Red Hat Enterprise Linux (RHEL) VM:
- To install from the command line (CLI), ensure that you set the default memory, network interfaces, and CPUs according to your requirement for the VM. For details, see Creating virtual machines by using the command line
- To install from the web console, see Creating virtual machines by using the web console
When the installation starts:
-
Create a
root
password. - Create an administrative user account.
-
Create a
-
After the installation completes, reboot the VM and log in to the
root
account. -
After logging in as
root
, you can configure the image. Register the VM and enable the RHEL repository:
subscription-manager register
# subscription-manager register
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For AMD64 or Intel 64 (x86_64) VMs, install the
nvme
,xen-netfront
, andxen-blkfront
drivers:dracut -f --add-drivers "nvme xen-netfront xen-blkfront"
# dracut -f --add-drivers "nvme xen-netfront xen-blkfront"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For ARM 64 (aarch64) VMs, install the
nvme
driver:dracut -f --add-drivers "nvme"
# dracut -f --add-drivers "nvme"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Including these drivers prevents a
dracut
time-out.Alternatively, you can add the drivers to
/etc/dracut.conf.d/
and then enterdracut -f
to overwrite the existinginitramfs
file.
Verification
Verify if the system has the
cloud-init
package and enable it:dnf install cloud-init systemctl enable --now cloud-init.service
# dnf install cloud-init # systemctl enable --now cloud-init.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Power off the VM.
3.3. Uploading a RHEL image to AWS by using the command line Link kopierenLink in die Zwischenablage kopiert!
To run a RHEL instance on Amazon Web Services (AWS), you must first upload a RHEL image to AWS. To configure and manage a RHEL EC2 instance on AWS, use the awscli2
utility.
3.3.1. Installing AWSCLI2 Link kopierenLink in die Zwischenablage kopiert!
You can use the AWS command line interface awscli2
utility to configure and manage RHEL images and Red Hat high availability (HA) cluster on AWS.
Prerequisites
- You have access to an AWS Access Key ID and an AWS Secret Access Key. For details, see manage access keys.
Procedure
Install
awscli2
:dnf install awscli2
# dnf install awscli2
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify the installation:
aws --version
$ aws --version aws-cli/1.19.77 Python/3.6.15 Linux/5.14.16-201.fc34.x86_64 botocore/1.20.77
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Configure
awscli2
for AWS credentials and settings:aws configure
$ aws configure AWS Access Key ID [None]: AWS Secret Access Key [None]: Default region name [None]: Default output format [None]:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
3.3.2. Converting and pushing an image to Amazon S3 Link kopierenLink in die Zwischenablage kopiert!
You can convert a RHEL image in the qcow2
image format to OVA
, VHD
, VHDX
, VMDK
, and raw
by using the qemu-img
utility, and then upload to the Amazon S3 storage. For details, see supported image formats by AWS.
Prerequisites
- You have created an Amazon S3 bucket by using awscli2 to upload the RHEL image.
Procedure
Run
qemu-img
to convert.qcow2
image to.raw
image format:qemu-img convert -f qcow2 -O raw rhel-10.0-sample.qcow2 rhel-10.0-sample.raw
# qemu-img convert -f qcow2 -O raw rhel-10.0-sample.qcow2 rhel-10.0-sample.raw
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Upload the image to the Amazon S3 bucket:
aws s3 cp rhel-10.0-sample.raw s3://<example-s3-bucket-name>
$ aws s3 cp rhel-10.0-sample.raw s3://<example-s3-bucket-name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
- Check the AWS S3 Console to confirm successful upload.
3.3.3. Managing a RHEL VM on AWS by using the command line Link kopierenLink in die Zwischenablage kopiert!
By using the awscli2
utility, you can manage a RHEL EC2 VM on AWS through command line. In this case, you can use the vmimport
role for managing RHEL EC2 image snapshot. With awscli2
, you can also import a RHEL EC2 image snapshot, create an AMI, launch, and connect to a RHEL EC2 instance.
-
Use the
vmimport
role: An alternate to import the RHEL image to the Amazon S3 bucket is by using thevimport
role. See Required permissions for VM Import/Export. - Import RHEL image as a snapshot: You can import RHEL VM image from Amazon S3 as a snapshot to Amazon EC2. For details, see Start an import snapshot task and Monitor an import snapshot task.
- Create and launch a RHEL EC2 instance: You can create a RHEL Amazon Machine Image (AMI) from existing snapshot and launch a RHEL EC2 instance. For details, see create an AMI from snapshot by using awscli2 and launching, listing, and deleting RHEL instance by using awscli2.
-
Configure the private key and connect to the RHEL EC2 instance: You can configure your
<example_key>.pem
file and connect to an RHEL EC2 instance. For details, see Create a key pair using Amazon EC2 and Connect using the AWS CLI.
3.3.4. Attaching Red Hat subscriptions Link kopierenLink in die Zwischenablage kopiert!
Using the subscription-manager
command, you can register and attach your Red Hat subscription to a RHEL instance.
Prerequisites
- You have an active Red Hat account.
Procedure
Register your system:
subscription-manager register
# subscription-manager register
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Attach your subscriptions:
- You can use an activation key to attach subscriptions. See Creating Red Hat Customer Portal Activation Keys for more information.
- Otherwise, you can manually attach a subscription by using the ID of the subscription pool (Pool ID). See Attaching a host-based subscription to hypervisors.
Optional: To collect various system metrics about the instance in the Red Hat Hybrid Cloud Console, you can register the instance with Red Hat Insights.
insights-client register --display-name <display_name_value>
# insights-client register --display-name <display_name_value>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow For information about further configuration of Red Hat Insights, see Client Configuration Guide for Red Hat Insights.
3.3.5. Setting up automatic registration on AWS gold images Link kopierenLink in die Zwischenablage kopiert!
You can deploy Red Hat Enterprise Linux (RHEL) virtual machines faster and more effortlessly on Amazon Web Services (AWS). For that, you must set up gold images of RHEL to be automatically registered to the Red Hat Subscription Manager (RHSM).
Prerequisites
You have downloaded the latest RHEL gold image for AWS. For instructions, see Using gold images on AWS.
NoteYou can only attach an AWS account to a single Red Hat account at a time. Therefore, ensure no other users require access to the AWS account before attaching it to your Red Hat one.
Procedure
Upload the gold image to AWS. For instructions, see one of the following:
- Create VMs by using the uploaded image. If your RHSM settings are correct, they will be automatically subscribed to RHSM.
Verification
In a RHEL VM created using the above instructions, verify the system is registered to RHSM by executing the
subscription-manager identity
command. On a successfully registered system, this displays the UUID of the system. For example:subscription-manager identity
# subscription-manager identity system identity: fdc46662-c536-43fb-a18a-bbcb283102b7 name: 192.168.122.222 org name: 6340056 org ID: 6340056
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
3.4. Uploading a RHEL image to AWS by using the AWS console Link kopierenLink in die Zwischenablage kopiert!
To run a RHEL instance on Amazon Web Services (AWS), you must first upload the RHEL image to AWS. To configure and manage the RHEL EC2 instance on AWS, use the awscli2
utility.
3.4.1. Converting and pushing an image to S3 by using the AWS console Link kopierenLink in die Zwischenablage kopiert!
You can convert a RHEL image in the qcow2
image format to OVA
, VHD
, VHDX
, VMDK
, or raw
, and upload it to an Amazon Elastic Cloud Computing (EC2) by using the qemu-img
utility. For details, see supported image formats by AWS.
Prerequisites
- You have created an Amazon S3 bucket by using the Amazon S3 console to upload the RHEL image.
Procedure
Run
qemu-img
to convert.qcow2
image to.raw
image format:qemu-img convert -f qcow2 -O raw rhel-10.0-sample.qcow2 rhel-10.0-sample.raw
# qemu-img convert -f qcow2 -O raw rhel-10.0-sample.qcow2 rhel-10.0-sample.raw
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Upload the image to the S3 bucket by using Amazon S3 console
Verification
- Check the AWS S3 Console to confirm successful upload.
3.4.2. Managing a RHEL VM on AWS by using the AWS console Link kopierenLink in die Zwischenablage kopiert!
You can manage a RHEL EC2 VM on AWS by using the AWS console. You can create RHEL EC2 image snapshot, manage Amazon Machine Image (AMI), launch, and connect to a RHEL EC2 instance.
Prerequisites
- You have pushed your RHEL image to the Amazon S3 bucket by using the AWS console. For details, see Converting and pushing an image to S3 by using the AWS console.
Procedure
-
Use the
vmimport
role: An alternate way to import the RHEL image to the Amazon S3 bucket is by using thevmimport
role. See Import your VM as an image. - Import a RHEL image as a snapshot: You can import a RHEL VM image from Amazon S3 as a snapshot to Amazon EC2. For details, see Importing a disk as a snapshot using VM Import/Export and Monitor an import snapshot task.
- Create and launch a RHEL EC2 instance: You can create a RHEL Amazon Machine Image (AMI) from an existing snapshot and launch a RHEL EC2 instance. For details, see Create an AMI from a snapshot and Launch an instance using defined parameters.
-
Configure the private key and connect to the RHEL EC2 instance: You can configure your
<example_key>.pem
file and connect to an RHEL EC2 instance. For details, see Create a key pair using Amazon EC2 and Connect using the Amazon EC2 console. - For Red Hat subscriptions, see Attaching Red Hat subscriptions
Chapter 4. Configuring a Red Hat High Availability cluster on AWS Link kopierenLink in die Zwischenablage kopiert!
A high availability (HA) cluster offers to group RHEL nodes so that workloads are automatically redistributed if a node fails. You can deploy HA clusters on public cloud platforms, including Amazon Web Services (AWS). The process for setting up HA clusters on AWS is comparable to configuring them in traditional, non-cloud environments.
To configure a Red Hat HA cluster on AWS that uses EC2 instances as cluster nodes, see the following sections. You have several options for obtaining RHEL images for the cluster. For details, see Available RHEL image types for public cloud.
Before you begin, make sure to complete following steps:
- You have created a Red Hat account.
- You have signed up and set up an AWS account.
4.1. Benefits of using high-availability clusters on public cloud platforms Link kopierenLink in die Zwischenablage kopiert!
A high-availability (HA) cluster links a set of computers (called nodes) to run a specific workload. HA clusters offer redundancy to handle hardware or software failures. When a node in the HA cluster fails, the Pacemaker cluster resource manager quickly distributes the workload to other nodes, ensuring that services on the cluster continue without noticeable downtime.
You can also run HA clusters on public cloud platforms. In this case, you would use virtual machine (VM) instances in the cloud as the individual cluster nodes. Using HA clusters on a public cloud platform has the following benefits:
- Improved availability: In case of a VM failure, the workload is quickly redistributed to other nodes, so running services are not disrupted.
- Scalability: You can start additional nodes when demand is high and stop them when demand is low.
- Cost-effectiveness: With the pay-as-you-go pricing, you pay only for nodes that are running.
- Simplified management: Some public cloud platforms offer management interfaces to make configuring HA clusters easier.
To enable HA on your RHEL systems, Red Hat offers a HA Add-On. You can configure a RHEL cluster with Red Hat HA Add-On to manage HA clusters with groups of RHEL servers. Red Hat HA Add-On gives access to integrated and streamlined tools. With cluster resource manager, fencing agents, and resource agents, you can set up and configure the cluster for automation. The Red Hat HA Add-On offers the following components for automation:
-
Pacemaker
, a cluster resource manager that offers both a command line utility (pcs
) and a GUI (pcsd
) to support many nodes -
Corosync
andKronosnet
to create and manage HA clusters - Resource agents to configure and manage custom applications
- Fencing agents to use cluster on platforms such as bare-metal servers and virtual machines
The Red Hat HA Add-On handles critical tasks such as node failures, load balancing, and node health checks for fault tolerance and system reliability.
4.2. Installing the High Availability packages and agents Link kopierenLink in die Zwischenablage kopiert!
On each of the nodes, you need to install the High Availability packages and agents to configure a Red Hat High Availability cluster on AWS.
Prerequisites
- You have completed the configuration for Uploading RHEL image to AWS by using the command line.
Procedure
Remove the AWS Red Hat Update Infrastructure (RHUI) client.
sudo -i dnf -y remove rh-amazon-rhui-client
$ sudo -i # dnf -y remove rh-amazon-rhui-client
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Register the VM with Red Hat.
subscription-manager register
# subscription-manager register
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Disable all repositories.
subscription-manager repos --disable=
# subscription-manager repos --disable=
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Enable the RHEL 10 Server HA repositories.
subscription-manager repos --enable=rhel-10-for-x86_64-highavailability-rpms
# subscription-manager repos --enable=rhel-10-for-x86_64-highavailability-rpms
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Update the RHEL AWS instance.
dnf update -y
# dnf update -y
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Install the Red Hat High Availability Add-On software packages, along with the AWS fencing agent from the High Availability channel.
dnf install pcs pacemaker fence-agents-aws
# dnf install pcs pacemaker fence-agents-aws
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The user
hacluster
was created during thepcs
andpacemaker
installation in the earlier step. Create a password forhacluster
on all cluster nodes. Use the same password for all nodes.passwd hacluster
# passwd hacluster
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add the
high availability
service to the RHEL Firewall iffirewalld.service
is installed.firewall-cmd --permanent --add-service=high-availability firewall-cmd --reload
# firewall-cmd --permanent --add-service=high-availability # firewall-cmd --reload
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Start the
pcs
service and enable it to start on boot.systemctl start pcsd.service systemctl enable pcsd.service
# systemctl start pcsd.service # systemctl enable pcsd.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Edit
/etc/hosts
and add Red Hat Enterprise Linux (RHEL) host names and internal IP addresses. See How should the /etc/hosts file be set up on RHEL cluster nodes? for details.
Verification
Ensure the
pcs
service is running.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.3. Creating a high availability cluster Link kopierenLink in die Zwischenablage kopiert!
You create a Red Hat High Availability Add-On cluster with the following procedure. This example procedure creates a cluster that consists of the nodes z1.example.com
and z2.example.com
.
To display the parameters of a pcs
command and a description of those parameters, use the -h
option of the pcs
command.
Procedure
Authenticate the
pcs
userhacluster
for each node in the cluster on the node from which you will be runningpcs
.The following command authenticates user
hacluster
onz1.example.com
for both of the nodes in a two-node cluster that will consist ofz1.example.com
andz2.example.com
.pcs host auth z1.example.com z2.example.com
[root@z1 ~]# pcs host auth z1.example.com z2.example.com Username: hacluster Password: z1.example.com: Authorized z2.example.com: Authorized
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Execute the following command from
z1.example.com
to create the two-node clustermy_cluster
that consists of nodesz1.example.com
andz2.example.com
. This will propagate the cluster configuration files to both nodes in the cluster. This command includes the--start
option, which will start the cluster services on both nodes in the cluster.pcs cluster setup my_cluster --start z1.example.com z2.example.com
[root@z1 ~]# pcs cluster setup my_cluster --start z1.example.com z2.example.com
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Enable the cluster services to run on each node in the cluster when the node is booted.
NoteFor your particular environment, you may choose to leave the cluster services disabled by skipping this step. This allows you to ensure that if a node goes down, any issues with your cluster or your resources are resolved before the node rejoins the cluster. If you leave the cluster services disabled, you will need to manually start the services when you reboot a node by executing the
pcs cluster start
command on that node.pcs cluster enable --all
[root@z1 ~]# pcs cluster enable --all
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Display the status of the cluster you created with the
pcs cluster status
command. Because there may be a slight delay before the cluster is up and running when you start the cluster services with the--start
option of thepcs cluster setup
command, you should ensure that the cluster is up and running before performing any subsequent actions on the cluster and its configuration.Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4. Configuring fencing in a Red Hat High Availability cluster Link kopierenLink in die Zwischenablage kopiert!
If communication with a single node in the cluster fails, then other nodes in the cluster must be able to restrict or release access to resources that the failed cluster node may have access to. This cannot be accomplished by contacting the cluster node itself as the cluster node may not be responsive. Instead, you must provide an external method, which is called fencing with a fence agent. A fence device is an external device that can be used by the cluster to restrict access to shared resources by an errant node, or to issue a hard reboot on the cluster node.
Without a fence device configured you do not have a way to know that the resources previously used by the disconnected cluster node have been released, and this could prevent the services from running on any of the other cluster nodes. Conversely, the system may assume erroneously that the cluster node has released its resources and this can lead to data corruption and data loss. Without a fence device configured data integrity cannot be guaranteed and the cluster configuration will be unsupported.
When the fencing is in progress no other cluster operation is allowed to run. Normal operation of the cluster cannot resume until fencing has completed or the cluster node rejoins the cluster after the cluster node has been rebooted. For more information about fencing and its importance in a Red Hat High Availability cluster, see the Red Hat Knowledgebase solution Fencing in a Red Hat High Availability Cluster.
4.4.1. Displaying available fence agents and their options Link kopierenLink in die Zwischenablage kopiert!
The following commands can be used to view available fencing agents and the available options for specific fencing agents.
Your system’s hardware determines the type of fencing device to use for your cluster. For information about supported platforms and architectures and the different fencing devices, see the Red Hat Knowledgebase article Cluster Platforms and Architectures section of the article Support Policies for RHEL High Availability Clusters.
Run the following command to list all available fencing agents. When you specify a filter, this command displays only the fencing agents that match the filter.
pcs stonith list [filter]
pcs stonith list [filter]
Run the following command to display the options for the specified fencing agent.
pcs stonith describe [stonith_agent]
pcs stonith describe [stonith_agent]
For example, the following command displays the options for the fence agent for APC over telnet/SSH.
For fence agents that provide a method
option, with the exception of the fence_sbd
agent a value of cycle
is unsupported and should not be specified, as it may cause data corruption. Even for fence_sbd
, however. you should not specify a method and instead use the default value.
4.4.2. Creating a fence device Link kopierenLink in die Zwischenablage kopiert!
The format for the command to create a fence device is as follows. For a listing of the available fence device creation options, see the pcs stonith -h
display.
pcs stonith create stonith_id stonith_device_type [stonith_device_options] [op operation_action operation_options]
pcs stonith create stonith_id stonith_device_type [stonith_device_options] [op operation_action operation_options]
The following command creates a single fencing device for a single node.
pcs stonith create MyStonith fence_virt pcmk_host_list=f1 op monitor interval=30s
# pcs stonith create MyStonith fence_virt pcmk_host_list=f1 op monitor interval=30s
Some fence devices can fence only a single node, while other devices can fence multiple nodes. The parameters you specify when you create a fencing device depend on what your fencing device supports and requires.
- Some fence devices can automatically determine what nodes they can fence.
-
You can use the
pcmk_host_list
parameter when creating a fencing device to specify all of the machines that are controlled by that fencing device. -
Some fence devices require a mapping of host names to the specifications that the fence device understands. You can map host names with the
pcmk_host_map
parameter when creating a fencing device.
For information about the pcmk_host_list
and pcmk_host_map
parameters, see General properties of fencing devices.
After configuring a fence device, it is imperative that you test the device to ensure that it is working correctly. For information about testing a fence device, see Testing a fence device.
4.4.3. General properties of fencing devices Link kopierenLink in die Zwischenablage kopiert!
There are many general properties you can set for fencing devices, as well as various cluster properties that determine fencing behavior.
Any cluster node can fence any other cluster node with any fence device, regardless of whether the fence resource is started or stopped. Whether the resource is started controls only the recurring monitor for the device, not whether it can be used, with the following exceptions:
-
You can disable a fencing device by running the
pcs stonith disable stonith_id
command. This will prevent any node from using that device. -
To prevent a specific node from using a fencing device, you can configure location constraints for the fencing resource with the
pcs constraint location … avoids
command. -
Configuring
stonith-enabled=false
will disable fencing altogether. Note, however, that Red Hat does not support clusters when fencing is disabled, as it is not suitable for a production environment.
The following table describes the general properties you can set for fencing devices.
Field | Type | Default | Description |
---|---|---|---|
| string |
A mapping of host names to port numbers for devices that do not support host names. For example: | |
| string |
A list of machines controlled by this device (Optional unless | |
| string |
*
* Otherwise,
* Otherwise,
*Otherwise, |
How to determine which machines are controlled by the device. Allowed values: |
The following table summarizes additional properties you can set for fencing devices. Note that these properties are for advanced use only.
Field | Type | Default | Description |
---|---|---|---|
| string | port |
An alternate parameter to supply instead of port. Some devices do not support the standard port parameter or may provide additional ones. Use this to specify an alternate, device-specific parameter that should indicate the machine to be fenced. A value of |
| string | reboot |
An alternate command to run instead of |
| time | 60s |
Specify an alternate timeout to use for reboot actions instead of |
| integer | 2 |
The maximum number of times to retry the |
| string | off |
An alternate command to run instead of |
| time | 60s |
Specify an alternate timeout to use for off actions instead of |
| integer | 2 | The maximum number of times to retry the off command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries off actions before giving up. |
| string | list |
An alternate command to run instead of |
| time | 60s | Specify an alternate timeout to use for list actions. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for list actions. |
| integer | 2 |
The maximum number of times to retry the |
| string | monitor |
An alternate command to run instead of |
| time | 60s |
Specify an alternate timeout to use for monitor actions instead of |
| integer | 2 |
The maximum number of times to retry the |
| string | status |
An alternate command to run instead of |
| time | 60s |
Specify an alternate timeout to use for status actions instead of |
| integer | 2 | The maximum number of times to retry the status command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries status actions before giving up. |
| string | 0s |
Enables a base delay for fencing actions and specifies a base delay value. You can specify different values for different nodes with the |
| time | 0s |
Enables a random delay for fencing actions and specifies the maximum delay, which is the maximum value of the combined base delay and random delay. For example, if the base delay is 3 and |
| integer | 1 |
The maximum number of actions that can be performed in parallel on this device. The cluster property |
| string | on |
For advanced use only: An alternate command to run instead of |
| time | 60s |
For advanced use only: Specify an alternate timeout to use for |
| integer | 2 |
For advanced use only: The maximum number of times to retry the |
In addition to the properties you can set for individual fence devices, there are also cluster properties you can set that determine fencing behavior, as described in the following table.
Option | Default | Description |
---|---|---|
| true |
Indicates that failed nodes and nodes with resources that cannot be stopped should be fenced. Protecting your data requires that you set this
If
Red Hat only supports clusters with this value set to |
| reboot |
Action to send to fencing device. Allowed values: |
| 60s | How long to wait for a STONITH action to complete. |
| 10 | How many times fencing can fail for a target before the cluster will no longer immediately re-attempt it. |
| The maximum time to wait until a node can be assumed to have been killed by the hardware watchdog. It is recommended that this value be set to twice the value of the hardware watchdog timeout. This option is needed only if watchdog-only SBD configuration is used for fencing. | |
| true | Allow fencing operations to be performed in parallel. |
| stop |
Determines how a cluster node should react if notified of its own fencing. A cluster node may receive notification of its own fencing if fencing is misconfigured, or if fabric fencing is in use that does not cut cluster communication. Allowed values are
Although the default value for this property is |
| 0 (disabled) | Sets a fencing delay that allows you to configure a two-node cluster so that in a split-brain situation the node with the fewest or least important resources running is the node that gets fenced. For general information about fencing delay parameters and their interactions, see Fencing delays. |
For information about setting cluster properties, see https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/10/html/configuring_and_managing_high_availability_clusters/index#setting-cluster-properties
4.4.4. Fencing delays Link kopierenLink in die Zwischenablage kopiert!
When cluster communication is lost in a two-node cluster, one node may detect this first and fence the other node. If both nodes detect this at the same time, however, each node may be able to initiate fencing of the other, leaving both nodes powered down or reset. By setting a fencing delay, you can decrease the likelihood of both cluster nodes fencing each other. You can set delays in a cluster with more than two nodes, but this is generally not of any benefit because only a partition with quorum will initiate fencing.
You can set different types of fencing delays, depending on your system requirements.
static fencing delays
A static fencing delay is a fixed, predetermined delay. Setting a static delay on one node makes that node more likely to be fenced because it increases the chances that the other node will initiate fencing first after detecting lost communication. In an active/passive cluster, setting a delay on a passive node makes it more likely that the passive node will be fenced when communication breaks down. You configure a static delay by using the
pcs_delay_base
cluster property. You can set this property when a separate fence device is used for each node or when a single fence device is used for all nodes.dynamic fencing delays
A dynamic fencing delay is random. It can vary and is determined at the time fencing is needed. You configure a random delay and specify a maximum value for the combined base delay and random delay with the
pcs_delay_max
cluster property. When the fencing delay for each node is random, which node is fenced is also random. You may find this feature useful if your cluster is configured with a single fence device for all nodes in an active/active design.priority fencing delays
A priority fencing delay is based on active resource priorities. If all resources have the same priority, the node with the fewest resources running is the node that gets fenced. In most cases, you use only one delay-related parameter, but it is possible to combine them. Combining delay-related parameters adds the priority values for the resources together to create a total delay. You configure a priority fencing delay with the
priority-fencing-delay
cluster property. You may find this feature useful in an active/active cluster design because it can make the node running the fewest resources more likely to be fenced when communication between the nodes is lost.
The pcmk_delay_base
cluster property
Setting the pcmk_delay_base
cluster property enables a base delay for fencing and specifies a base delay value.
When you set the pcmk_delay_max
cluster property in addition to the pcmk_delay_base
property, the overall delay is derived from a random delay value added to this static delay so that the sum is kept below the maximum delay. When you set pcmk_delay_base
but do not set pcmk_delay_max
, there is no random component to the delay and it will be the value of pcmk_delay_base
.
You can specify different values for different nodes with the pcmk_delay_base
parameter. This allows a single fence device to be used in a two-node cluster, with a different delay for each node. You do not need to configure two separate devices to use separate delays. To specify different values for different nodes, you map the host names to the delay value for that node using a similar syntax to pcmk_host_map
. For example, node1:0;node2:10s
would use no delay when fencing node1
and a 10-second delay when fencing node2
.
- The
pcmk_delay_max
cluster property Setting the
pcmk_delay_max
cluster property enables a random delay for fencing actions and specifies the maximum delay, which is the maximum value of the combined base delay and random delay. For example, if the base delay is 3 andpcmk_delay_max
is 10, the random delay will be between 3 and 10.When you set the
pcmk_delay_base
cluster property in addition to thepcmk_delay_max
property, the overall delay is derived from a random delay value added to this static delay so that the sum is kept below the maximum delay. When you setpcmk_delay_max
but do not setpcmk_delay_base
there is no static component to the delay.
The priority-fencing-delay
cluster property
Setting the priority-fencing-delay
cluster property allows you to configure a two-node cluster so that in a split-brain situation the node with the fewest or least important resources running is the node that gets fenced.
The priority-fencing-delay
property can be set to a time duration. The default value for this property is 0 (disabled). If this property is set to a non-zero value, and the priority meta-attribute is configured for at least one resource, then in a split-brain situation the node with the highest combined priority of all resources running on it will be more likely to remain operational. For example, if you set pcs resource defaults update priority=1
and pcs property set priority-fencing-delay=15s
and no other priorities are set, then the node running the most resources will be more likely to remain operational because the other node will wait 15 seconds before initiating fencing. If a particular resource is more important than the rest, you can give it a higher priority.
The node running the promoted role of a promotable clone gets an extra 1 point if a priority has been configured for that clone.
Interaction of fencing delays
Setting more than one type of fencing delay yields the following results:
-
Any delay set with the
priority-fencing-delay
property is added to any delay from thepcmk_delay_base
andpcmk_delay_max
fence device properties. This behavior allows some delay when both nodes have equal priority, or both nodes need to be fenced for some reason other than node loss, as whenon-fail=fencing
is set for a resource monitor operation. When setting these delays in combination, set thepriority-fencing-delay
property to a value that is significantly greater than the maximum delay frompcmk_delay_base
andpcmk_delay_max
to be sure the prioritized node is preferred. Setting this property to twice this value is always safe. -
Only fencing scheduled by Pacemaker itself observes fencing delays. Fencing scheduled by external code such as
dlm_controld
and fencing implemented by thepcs stonith fence
command do not provide the necessary information to the fence device. -
Some individual fence agents implement a delay parameter, with a name determined by the agent and which is independent of delays configured with a
pcmk_delay_
* property. If both of these delays are configured, they are added together and would generally not be used in conjunction.
4.4.5. Testing a fence device Link kopierenLink in die Zwischenablage kopiert!
Fencing is a fundamental part of the Red Hat Cluster infrastructure and it is important to validate or test that fencing is working properly.
When a Pacemaker cluster node or Pacemaker remote node is fenced a hard kill should occur and not a graceful shutdown of the operating system. If a graceful shutdown occurs when your system fences a node, disable ACPI soft-off in the /etc/systemd/logind.conf
file so that your system ignores any power-button-pressed signal. For instructions on disabling ACPI soft-off in the logind.conf
file, see Disabling ACPI soft-off in the logind.conf file
Use the following procedure to test a fence device.
Procedure
Use SSH, Telnet, HTTP, or whatever remote protocol is used to connect to the device to manually log in and test the fence device or see what output is given. For example, if you will be configuring fencing for an IPMI-enabled device,then try to log in remotely with
ipmitool
. Take note of the options used when logging in manually because those options might be needed when using the fencing agent.If you are unable to log in to the fence device, verify that the device is pingable, there is nothing such as a firewall configuration that is preventing access to the fence device, remote access is enabled on the fencing device, and the credentials are correct.
Run the fence agent manually, using the fence agent script. This does not require that the cluster services are running, so you can perform this step before the device is configured in the cluster. This can ensure that the fence device is responding properly before proceeding.
NoteThese examples use the
fence_ipmilan
fence agent script for an iLO device. The actual fence agent you will use and the command that calls that agent will depend on your server hardware. You should consult the man page for the fence agent you are using to determine which options to specify. You will usually need to know the login and password for the fence device and other information related to the fence device.The following example shows the format you would use to run the
fence_ipmilan
fence agent script with-o status
parameter to check the status of the fence device interface on another node without actually fencing it. This allows you to test the device and get it working before attempting to reboot the node. When running this command, you specify the name and password of an iLO user that has power on and off permissions for the iLO device.fence_ipmilan -a ipaddress -l username -p password -o status
# fence_ipmilan -a ipaddress -l username -p password -o status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following example shows the format you would use to run the
fence_ipmilan
fence agent script with the-o reboot
parameter. Running this command on one node reboots the node managed by this iLO device.fence_ipmilan -a ipaddress -l username -p password -o reboot
# fence_ipmilan -a ipaddress -l username -p password -o reboot
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the fence agent failed to properly do a status, off, on, or reboot action, you should check the hardware, the configuration of the fence device, and the syntax of your commands. In addition, you can run the fence agent script with the debug output enabled. The debug output is useful for some fencing agents to see where in the sequence of events the fencing agent script is failing when logging into the fence device.
fence_ipmilan -a ipaddress -l username -p password -o status -D /tmp/$(hostname)-fence_agent.debug
# fence_ipmilan -a ipaddress -l username -p password -o status -D /tmp/$(hostname)-fence_agent.debug
Copy to Clipboard Copied! Toggle word wrap Toggle overflow When diagnosing a failure that has occurred, you should ensure that the options you specified when manually logging in to the fence device are identical to what you passed on to the fence agent with the fence agent script.
For fence agents that support an encrypted connection, you may see an error due to certificate validation failing, requiring that you trust the host or that you use the fence agent’s
ssl-insecure
parameter. Similarly, if SSL/TLS is disabled on the target device, you may need to account for this when setting the SSL parameters for the fence agent.NoteIf the fence agent that is being tested is a
fence_drac
,fence_ilo
, or some other fencing agent for a systems management device that continues to fail, then fall back to tryingfence_ipmilan
. Most systems management cards support IPMI remote login and the only supported fencing agent isfence_ipmilan
.Once the fence device has been configured in the cluster with the same options that worked manually and the cluster has been started, test fencing with the
pcs stonith fence
command from any node (or even multiple times from different nodes), as in the following example. Thepcs stonith fence
command reads the cluster configuration from the CIB and calls the fence agent as configured to execute the fence action. This verifies that the cluster configuration is correct.pcs stonith fence node_name
# pcs stonith fence node_name
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the
pcs stonith fence
command works properly, that means the fencing configuration for the cluster should work when a fence event occurs. If the command fails, it means that cluster management cannot invoke the fence device through the configuration it has retrieved. Check for the following issues and update your cluster configuration as needed.- Check your fence configuration. For example, if you have used a host map you should ensure that the system can find the node using the host name you have provided.
- Check whether the password and user name for the device include any special characters that could be misinterpreted by the bash shell. Making sure that you enter passwords and user names surrounded by quotation marks could address this issue.
-
Check whether you can connect to the device using the exact IP address or host name you specified in the
pcs stonith
command. For example, if you give the host name in the stonith command but test by using the IP address, that is not a valid test. If the protocol that your fence device uses is accessible to you, use that protocol to try to connect to the device. For example many agents use ssh or telnet. You should try to connect to the device with the credentials you provided when configuring the device, to see if you get a valid prompt and can log in to the device.
If you determine that all your parameters are appropriate but you still have trouble connecting to your fence device, you can check the logging on the fence device itself, if the device provides that, which will show if the user has connected and what command the user issued. You can also search through the
/var/log/messages
file for instances of stonith and error, which could give some idea of what is transpiring, but some agents can provide additional information.
Once the fence device tests are working and the cluster is up and running, test an actual failure. To do this, take an action in the cluster that should initiate a token loss.
Take down a network. How you take a network depends on your specific configuration. In many cases, you can physically pull the network or power cables out of the host. For information about simulating a network failure, see the Red Hat Knowledgebase solution What is the proper way to simulate a network failure on a RHEL Cluster?.
NoteDisabling the network interface on the local host rather than physically disconnecting the network or power cables is not recommended as a test of fencing because it does not accurately simulate a typical real-world failure.
Block corosync traffic both inbound and outbound using the local firewall.
The following example blocks corosync, assuming the default corosync port is used,
firewalld
is used as the local firewall, and the network interface used by corosync is in the default firewall zone:firewall-cmd --direct --add-rule ipv4 filter OUTPUT 2 -p udp --dport=5405 -j DROP firewall-cmd --add-rich-rule='rule family="ipv4" port port="5405" protocol="udp" drop
# firewall-cmd --direct --add-rule ipv4 filter OUTPUT 2 -p udp --dport=5405 -j DROP # firewall-cmd --add-rich-rule='rule family="ipv4" port port="5405" protocol="udp" drop
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Simulate a crash and panic your machine with
sysrq-trigger
. Note, however, that triggering a kernel panic can cause data loss; it is recommended that you disable your cluster resources first.echo c > /proc/sysrq-trigger
# echo c > /proc/sysrq-trigger
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.6. Configuring fencing levels Link kopierenLink in die Zwischenablage kopiert!
Pacemaker supports fencing nodes with multiple devices through a feature called fencing topologies. To implement topologies, create the individual devices as you normally would and then define one or more fencing levels in the fencing topology section in the configuration.
Pacemaker processes fencing levels as follows:
- Each level is attempted in ascending numeric order, starting at 1.
- If a device fails, processing terminates for the current level. No further devices in that level are exercised and the next level is attempted instead.
- If all devices are successfully fenced, then that level has succeeded and no other levels are tried.
- The operation is finished when a level has passed (success), or all levels have been attempted (failed).
Use the following command to add a fencing level to a node. The devices are given as a comma-separated list of stonith
ids, which are attempted for the node at that level.
pcs stonith level add level node devices
pcs stonith level add level node devices
The following example sets up fence levels so that if the device my_ilo
fails and is unable to fence the node, then Pacemaker attempts to use the device my_apc
.
Prerequisites
-
You have configured an ilo fence device called
my_ilo
for noderh7-2
. -
You have configured an apc fence device called
my_apc
for noderh7-2
.
Procedure
Add a fencing level of 1 for fence device
my_ilo
on noderh7-2
.pcs stonith level add 1 rh7-2 my_ilo
# pcs stonith level add 1 rh7-2 my_ilo
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Add a fencing level of 2 for fence device
my_apc
on noderh7-2
.pcs stonith level add 2 rh7-2 my_apc
# pcs stonith level add 2 rh7-2 my_apc
Copy to Clipboard Copied! Toggle word wrap Toggle overflow List the currently configured fencing levels.
pcs stonith level
# pcs stonith level Node: rh7-2 Level 1 - my_ilo Level 2 - my_apc
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
For information about node attributes, see https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/10/html-single/configuring_and_managing_high_availability_clusters/index#node-attributes
4.4.7. Removing a fence level Link kopierenLink in die Zwischenablage kopiert!
You can removed the fence level for the specified node and device. If no nodes or devices are specified then the fence level you specify is removed from all nodes.
Procedure
Remove the fence level for the specified node and device:
pcs stonith level remove level [node_id] [stonith_id] ... [stonith_id]
# pcs stonith level remove level [node_id] [stonith_id] ... [stonith_id]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.8. Clearing fence levels Link kopierenLink in die Zwischenablage kopiert!
You can clear the fence levels on the specified node or stonith id. If you do not specify a node or stonith id, all fence levels are cleared.
Procedure
Clear the fence levels on the specified node or stinith id:
pcs stonith level clear [node]|stonith_id(s)]
# pcs stonith level clear [node]|stonith_id(s)]
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you specify more than one stonith id, they must be separated by a comma and no spaces, as in the following example.
pcs stonith level clear dev_a,dev_b
# pcs stonith level clear dev_a,dev_b
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.9. Verifying nodes and devices in fence levels Link kopierenLink in die Zwischenablage kopiert!
You can verify that all fence devices and nodes specified in fence levels exist.
Procedure
Use the following command to verify that all fence devices and nodes specified in fence levels exist:
pcs stonith level verify
# pcs stonith level verify
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.10. Specifying nodes in fencing topology Link kopierenLink in die Zwischenablage kopiert!
You can specify nodes in fencing topology by a regular expression applied on a node name and by a node attribute and its value.
Procedure
The following commands configure nodes
node1
,node2
, andnode3
to use fence devicesapc1
andapc2
, and nodesnode4
,node5
, andnode6
to use fence devicesapc3
andapc4
:pcs stonith level add 1 "regexp%node[1-3]" apc1,apc2 pcs stonith level add 1 "regexp%node[4-6]" apc3,apc4
# pcs stonith level add 1 "regexp%node[1-3]" apc1,apc2 # pcs stonith level add 1 "regexp%node[4-6]" apc3,apc4
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following commands yield the same results by using node attribute matching:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.11. Configuring fencing for redundant power supplies Link kopierenLink in die Zwischenablage kopiert!
When configuring fencing for redundant power supplies, the cluster must ensure that when attempting to reboot a host, both power supplies are turned off before either power supply is turned back on.
If the node never completely loses power, the node may not release its resources. This opens up the possibility of nodes accessing these resources simultaneously and corrupting them.
You need to define each device only once and to specify that both are required to fence the node.
Procedure
Create the first fence device.
pcs stonith create apc1 fence_apc_snmp ipaddr=apc1.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"
# pcs stonith create apc1 fence_apc_snmp ipaddr=apc1.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the second fence device.
pcs stonith create apc2 fence_apc_snmp ipaddr=apc2.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"
# pcs stonith create apc2 fence_apc_snmp ipaddr=apc2.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Specify that both devices are required to fence the node.
pcs stonith level add 1 node1.example.com apc1,apc2 pcs stonith level add 1 node2.example.com apc1,apc2
# pcs stonith level add 1 node1.example.com apc1,apc2 # pcs stonith level add 1 node2.example.com apc1,apc2
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4.12. Administering fence devices Link kopierenLink in die Zwischenablage kopiert!
The pcs
command-line interface provides a variety of commands you can use to administer your fence devices after you have configured them.
4.4.12.1. Displaying configured fence devices Link kopierenLink in die Zwischenablage kopiert!
The following command shows all currently configured fence devices. If a stonith_id is specified, the command shows the options for that configured fencing device only. If the --full
option is specified, all configured fencing options are displayed.
pcs stonith config [stonith_id] [--full]
pcs stonith config [stonith_id] [--full]
4.4.12.2. Exporting fence devices as pcs commands Link kopierenLink in die Zwischenablage kopiert!
You can display the pcs
commands that can be used to re-create configured fence devices on a different system using the --output-format=cmd
option of the pcs stonith config
command.
The following commands create a fence_apc_snmp
fence device and display the pcs
command you can use to re-create the device.
4.4.12.3. Exporting fence level configuration Link kopierenLink in die Zwischenablage kopiert!
The pcs stonith config
and the pcs stonith level config
commands support the --output-format=
option to export the fencing level configuration in JSON format and as pcs
commands.
-
Specifying
--output-format=cmd
displays thepcs
commands created from the current cluster configuration that configure fencing levels. You can use these commands to re-create configured fencing levels on a different system. -
Specifying
--output-format=json
displays the fencing level configuration in JSON format, which is suitable for machine parsing.
4.4.12.4. Modifying and deleting fence devices Link kopierenLink in die Zwischenablage kopiert!
Modify or add options to a currently configured fencing device with the following command.
pcs stonith update stonith_id [stonith_device_options]
pcs stonith update stonith_id [stonith_device_options]
Updating a SCSI fencing device with the pcs stonith update
command causes a restart of all resources running on the same node where the fencing resource was running. You can use either version of the following command to update SCSI devices without causing a restart of other cluster resources. SCSI fencing devices can be configured as multipath devices.
pcs stonith update-scsi-devices stonith_id set device-path1 device-path2 pcs stonith update-scsi-devices stonith_id add device-path1 remove device-path2
pcs stonith update-scsi-devices stonith_id set device-path1 device-path2
pcs stonith update-scsi-devices stonith_id add device-path1 remove device-path2
Use the following command to remove a fencing device from the current configuration.
pcs stonith delete stonith_id
pcs stonith delete stonith_id
4.4.12.5. Manually fencing a cluster node Link kopierenLink in die Zwischenablage kopiert!
You can fence a node manually with the following command. If you specify the --off
option this will use the off
API call to stonith which will turn the node off instead of rebooting it.
pcs stonith fence node [--off]
pcs stonith fence node [--off]
In a situation where no fence device is able to fence a node even if it is no longer active, the cluster may not be able to recover the resources on the node. If this occurs, after manually ensuring that the node is powered down you can enter the following command to confirm to the cluster that the node is powered down and free its resources for recovery.
If the node you specify is not actually off, but running the cluster software or services normally controlled by the cluster, data corruption and cluster failure occurs.
pcs stonith confirm node
pcs stonith confirm node
4.4.12.6. Disabling a fence device Link kopierenLink in die Zwischenablage kopiert!
To disable a fencing device, run the pcs stonith disable
command.
The following command disables the fence device myapc
.
pcs stonith disable myapc
# pcs stonith disable myapc
4.4.12.7. Preventing a node from using a fencing device Link kopierenLink in die Zwischenablage kopiert!
To prevent a specific node from using a fencing device, you can configure location constraints for the fencing resource.
The following example prevents fence device node1-ipmi
from running on node1
.
pcs constraint location node1-ipmi avoids node1
# pcs constraint location node1-ipmi avoids node1
4.4.13. Configuring ACPI for use with integrated fence devices Link kopierenLink in die Zwischenablage kopiert!
If your cluster uses integrated fence devices, you must configure ACPI (Advanced Configuration and Power Interface) to ensure immediate and complete fencing.
If a cluster node is configured to be fenced by an integrated fence device, disable ACPI Soft-Off for that node. Disabling ACPI Soft-Off allows an integrated fence device to turn off a node immediately and completely rather than attempting a clean shutdown (for example, shutdown -h now
). Otherwise, if ACPI Soft-Off is enabled, an integrated fence device can take four or more seconds to turn off a node (see the note that follows). In addition, if ACPI Soft-Off is enabled and a node panics or freezes during shutdown, an integrated fence device may not be able to turn off the node. Under those circumstances, fencing is delayed or unsuccessful. Consequently, when a node is fenced with an integrated fence device and ACPI Soft-Off is enabled, a cluster recovers slowly or requires administrative intervention to recover.
The amount of time required to fence a node depends on the integrated fence device used. Some integrated fence devices perform the equivalent of pressing and holding the power button; therefore, the fence device turns off the node in four to five seconds. Other integrated fence devices perform the equivalent of pressing the power button momentarily, relying on the operating system to turn off the node; therefore, the fence device turns off the node in a time span much longer than four to five seconds.
- The preferred way to disable ACPI Soft-Off is to change the BIOS setting to "instant-off" or an equivalent setting that turns off the node without delay, as described in Disabling ACPI Soft-Off with the Bios".
Disabling ACPI Soft-Off with the BIOS may not be possible with some systems. If disabling ACPI Soft-Off with the BIOS is not satisfactory for your cluster, you can disable ACPI Soft-Off with one of the following alternate methods:
-
Setting
HandlePowerKey=ignore
in the/etc/systemd/logind.conf
file and verifying that the node node turns off immediately when fenced, as described in Disabling ACPI soft-off in the logind.conf file. This is the first alternate method of disabling ACPI Soft-Off. Appending
acpi=off
to the kernel boot command line, as described in Disabling ACPI completely in the GRUB 2 file. This is the second alternate method of disabling ACPI Soft-Off, if the preferred or the first alternate method is not available.ImportantThis method completely disables ACPI; some computers do not boot correctly if ACPI is completely disabled. Use this method only if the other methods are not effective for your cluster.
4.4.13.1. Disabling ACPI Soft-Off with the BIOS Link kopierenLink in die Zwischenablage kopiert!
You can disable ACPI Soft-Off by configuring the BIOS of each cluster node with the following procedure.
The procedure for disabling ACPI Soft-Off with the BIOS may differ among server systems. You should verify this procedure with your hardware documentation.
Procedure
-
Reboot the node and start the
BIOS CMOS Setup Utility
program. - Navigate to the Power menu (or equivalent power management menu).
At the Power menu, set the
Soft-Off by PWR-BTTN
function (or equivalent) toInstant-Off
(or the equivalent setting that turns off the node by means of the power button without delay). TheBIOS CMOS Setup Utiliy
example below shows a Power menu withACPI Function
set toEnabled
andSoft-Off by PWR-BTTN
set toInstant-Off
.NoteThe equivalents to
ACPI Function
,Soft-Off by PWR-BTTN
, andInstant-Off
may vary among computers. However, the objective of this procedure is to configure the BIOS so that the computer is turned off by means of the power button without delay.-
Exit the
BIOS CMOS Setup Utility
program, saving the BIOS configuration. - Verify that the node turns off immediately when fenced. For information about testing a fence device, see Testing a fence device.
BIOS CMOS Setup Utility
:
`Soft-Off by PWR-BTTN` set to `Instant-Off`
`Soft-Off by PWR-BTTN` set to
`Instant-Off`
This example shows ACPI Function
set to Enabled
, and Soft-Off by PWR-BTTN
set to Instant-Off
.
4.4.13.2. Disabling ACPI Soft-Off in the logind.conf file Link kopierenLink in die Zwischenablage kopiert!
To disable power-key handing in the /etc/systemd/logind.conf
file, use the following procedure.
Procedure
Define the following configuration in the
/etc/systemd/logind.conf
file:HandlePowerKey=ignore
HandlePowerKey=ignore
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Restart the
systemd-logind
service:systemctl restart systemd-logind.service
# systemctl restart systemd-logind.service
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
- Verify that the node turns off immediately when fenced. For information about testing a fence device, see Testing a fence device.
4.4.13.3. Disabling ACPI completely in the GRUB 2 file Link kopierenLink in die Zwischenablage kopiert!
You can disable ACPI Soft-Off by appending acpi=off
to the GRUB menu entry for a kernel.
This method completely disables ACPI; some computers do not boot correctly if ACPI is completely disabled. Use this method only if the other methods are not effective for your cluster.
Procedure
Use the
--args
option in combination with the--update-kernel
option of thegrubby
tool to change thegrub.cfg
file of each cluster node as follows:grubby --args=acpi=off --update-kernel=ALL
# grubby --args=acpi=off --update-kernel=ALL
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Reboot the node.
Verification
- Verify that the node turns off immediately when fenced. For information about testing a fence device, see Testing a fence device
4.5. Setting up IP address resources on AWS Link kopierenLink in die Zwischenablage kopiert!
Clients use IP addresses to manage cluster resources across the network. To handle a failover of a cluster, include IP address resources in the cluster that use specific network resource agents. The RHEL HA Add-On provides a set of resource agents, which create IP address resources to manage various types of IP addresses on AWS. Depending on the type of AWS IP addresses in the HA cluster management, you can decide the resource agent to configure. It includes the following ways to create a cluster resource for managing an IP address:
-
Exposed to the internet: Use the
awseip
network resource. -
Limited to a single AWS Availability Zone (AZ): Use the
awsvip
andIPaddr2
network resources. Reassigns to many AWS AZs within the same AWS region: Use the
aws-vpc-move-ip
network resource.NoteIf the HA cluster does not manage any IP addresses, the resource agents for managing virtual IP addresses on AWS are not required. If you need further guidance for your specific deployment, consult with AWS.
4.5.1. Creating an IP address resource to manage an IP address exposed to the internet Link kopierenLink in die Zwischenablage kopiert!
To ensure that high-availability (HA) clients can access a RHEL node that uses public-facing internet connections, configure an AWS Secondary Elastic IP Address (awseip
) resource to use an elastic IP address.
Prerequisites
- You have a configured cluster ready to use.
- Your cluster nodes must have access to the RHEL HA repositories. For details, see Installing the High Availability packages and agents.
- You have set up the AWS CLI2. For details, see Installing AWSCLI2.
Procedure
-
Add the two resources to the same group that you have already created to enforce
order
andcolocation
constraints. Install the
resource-agents
package:dnf install resource-agents
# dnf install resource-agents
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create an elastic IP address:
aws ec2 allocate-address --domain vpc --output text
[root@ip-10-0-0-48 ~]# aws ec2 allocate-address --domain vpc --output text eipalloc-4c4a2c45 vpc 35.169.153.122
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: Display the description of
awseip
. This shows the options and default operations for this agent.pcs resource describe awseip
# pcs resource describe awseip
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the Secondary Elastic IP address resource with the allocated IP address in the 2nd step:
pcs resource create <resource_id> awseip elastic_ip=<elastic_ip_address> allocation_id=<elastic_ip_association_id> --group networking-group
# pcs resource create <resource_id> awseip elastic_ip=<elastic_ip_address> allocation_id=<elastic_ip_association_id> --group networking-group
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example:
pcs resource create elastic awseip elastic_ip=35.169.153.122 allocation_id=eipalloc-4c4a2c45 --group networking-group
# pcs resource create elastic awseip elastic_ip=35.169.153.122 allocation_id=eipalloc-4c4a2c45 --group networking-group
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify the cluster status to ensure resources are available:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example,
newcluster
is an active cluster where resources such asvip
andelastic
are part of thenetworking-group
resource group.Launch an SSH session from your local workstation to the elastic IP address that you have already created:
ssh -l ec2-user -i ~/.ssh/cluster-admin.pem 35.169.153.122
$ ssh -l ec2-user -i ~/.ssh/cluster-admin.pem 35.169.153.122
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Verify that the SSH connected host is same as the host with the elastic resources.
4.5.2. Creating an IP address resource to manage a private IP address limited to a single AWS Availability Zone Link kopierenLink in die Zwischenablage kopiert!
You can configure an AWS Secondary Private IP Address (awsvip
) resource to use a virtual IP address. With awsvip
, high-availability (HA) clients on AWS can access a RHEL node to use a private IP address accessible in only a single availability zone (AZ). You can complete the following procedure on any node in the cluster.
Prerequisites
- You have a configured cluster ready to use.
- Your cluster nodes have access to the RHEL HA repositories. For details,see Installing the High Availability packages and agents.
- You have set up the AWS CLI. For instructions, see Installing AWSCLI2.
Procedure
Install the
resource-agents
package.dnf install resource-agents
# dnf install resource-agents
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: View the options and default operations for
awsvip
:pcs resource describe awsvip
# pcs resource describe awsvip
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a Secondary Private IP address with an unused private IP address in the
VPC CIDR
block:pcs resource create privip awsvip secondary_private_ip=10.0.0.68 --group networking-group
[root@ip-10-0-0-48 ~]# pcs resource create privip awsvip secondary_private_ip=10.0.0.68 --group networking-group
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Here, secondary private IP address is a part of gets included in a resource group
Create a virtual IP resource with the
vip
resource ID and thenetworking-group
group name:root@ip-10-0-0-48 ~]# pcs resource create vip IPaddr2 ip=10.0.0.68 --group networking-group
root@ip-10-0-0-48 ~]# pcs resource create vip IPaddr2 ip=10.0.0.68 --group networking-group
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This is a Virtual Private Cloud (VPC) IP address that maps from the fence node to the failover node, masking the failure of the fence node within the subnet. Ensure that the virtual IP belongs to the same resource group as the Secondary Private IP address you created in the last step.
Verification
Verify the cluster status to ensure resources are available:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In this example,
newcluster
is an active cluster where resources such asvip
andelastic
are part of thenetworking-group
resource group.
4.5.3. Creating an IP address resource to manage an IP address that can move across multiple AWS Availability Zones Link kopierenLink in die Zwischenablage kopiert!
You can configure The Red Hat Enterprise Linux (RHEL) Overlay IP (aws-vpc-move-ip
) resource agent to use an elastic IP address. With aws-vpc-move-ip
, you can manage a RHEL node by moving it across many availability zones (AZ
) in a single region of AWS for high-availability (HA) clients.
Prerequisites
- You have an already configured cluster.
- Your cluster nodes have access to the RHEL HA repositories. For more information, see Installing the High Availability packages and agents.
- You have set up the AWS CLI. For instructions, see Installing AWSCLI2.
You have configured an Identity and Access Management (IAM) user on your cluster with the following permissions:
- Modify routing tables
- Create security groups
- Create IAM policies and roles
Procedure
Install the
resource-agents
package:dnf install resource-agents
# dnf install resource-agents
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Optional: View the options and default operations for
awsvip
:pcs resource describe aws-vpc-move-ip
# pcs resource describe aws-vpc-move-ip
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set up an
OverlayIPAgent
IAM policy for the IAM user.-
In the AWS console, navigate to Services → IAM → Policies → Create
OverlayIPAgent
Policy Input the following configuration, and change the <region>, <account_id>, and <cluster_route_table_id> values to correspond with your cluster.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
-
In the AWS console, navigate to Services → IAM → Policies → Create
In the AWS console, disable the
Source/Destination Check
function on all nodes in the cluster.To do this, right-click each node → Networking → Change Source/Destination Checks. In the pop-up message that appears, click Yes, Disable.
Create a route table for the cluster. To do so, use the following command on one node in the cluster:
aws ec2 create-route --route-table-id <cluster_route_table_id> --destination-cidr-block <new_cidr_block_ip/net_mask> --instance-id <cluster_node_id>
# aws ec2 create-route --route-table-id <cluster_route_table_id> --destination-cidr-block <new_cidr_block_ip/net_mask> --instance-id <cluster_node_id>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow In the command, replace values as follows:
-
ClusterRouteTableID
: The route table ID for the existing cluster Virtual Private Cloud (VPC) route table. -
NewCIDRblockIP
: A new IP address and netmask outside of the VPC classless inter-domain routing (CIDR) block. For example, if the VPC CIDR block is172.31.0.0/16
, the new IP address or netmask can be192.168.0.15/32
. -
ClusterNodeID
: The instance ID for another node in the cluster.
-
On one of the nodes in the cluster, create a
aws-vpc-move-ip
resource that uses a free IP address that is accessible to the client. The following example creates a resource namedvpcip
that uses IP192.168.0.15
.pcs resource create vpcip aws-vpc-move-ip ip=192.168.0.15 interface=eth0 routing_table=<cluster_route_table_id>
# pcs resource create vpcip aws-vpc-move-ip ip=192.168.0.15 interface=eth0 routing_table=<cluster_route_table_id>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow On all nodes in the cluster, edit the
/etc/hosts/
file, and add a line with the IP address of the newly created resource. For example:192.168.0.15 vpcip
192.168.0.15 vpcip
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Test the failover ability of the new
aws-vpc-move-ip
resource:pcs resource move vpcip
# pcs resource move vpcip
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the failover succeeded, remove the automatically created constraint after the move of the
vpcip
resource:pcs resource clear vpcip
# pcs resource clear vpcip
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Chapter 5. Configuring RHEL on AWS with Secure Boot Link kopierenLink in die Zwischenablage kopiert!
Secure Boot, defined in the Unified Extensible Firmware Interface (UEFI) specification, controls to run only trusted and authorized programs at boot time. It verifies the digital signatures of the boot loader and its components during startup, ensuring that only trusted and authorized programs load and blocking unauthorized ones. You can enable this feature for both AWS Marketplace Red Hat Enterprise Linux (RHEL) Amazon Machine Image (AMI) and custom RHEL AMI.
5.1. Types of RHEL AMI on AWS Link kopierenLink in die Zwischenablage kopiert!
- AWS Marketplace RHEL AMI
- The AWS Marketplace offers pre-configured Red Hat Enterprise Linux (RHEL) Amazon Machine Image (AMI) designed for specific use cases, such as data processing, system management, and web development. These ready-to-use images help reduce setup time by minimizing the manual installation and configuration required for operating systems and software packages.
- Custom RHEL AMI
- A custom RHEL AMI offers flexibility to customers and organizations to build and deploy tailored environments that meet specific application and workflow requirements. By creating custom RHEL AMI, you can use RHEL instances that are pre-installed with necessary tools, configurations, and security policies. This customization aims at greater control over the infrastructure.
5.2. Understanding Secure Boot for RHEL on cloud Link kopierenLink in die Zwischenablage kopiert!
Secure Boot is a feature of Unified Extensible Firmware Interface (UEFI). It ensures that only trusted and digitally signed programs and components, such as the boot loader and kernel, run during boot time. Secure Boot checks digital signatures against trusted keys stored in hardware. If it detects any tampered components or components signed by untrusted entities, it aborts the boot process. This action prevents malicious software from compromising the operating system.
Secure Boot plays a critical role in configuring a Confidential Virtual Machine (CVM) by ensuring that only trusted entities participate in the boot chain. It authenticates access to specific device paths through defined interfaces, enforces the use of the latest configuration, and permanently overwrites earlier configurations. When the Red Hat Enterprise Linux (RHEL) kernel boots with Secure Boot enabled, it enters the lockdown
mode, allowing only kernel modules signed by a trusted vendor to load. As a result, Secure Boot strengthens the security of the operating system boot sequence.
5.2.1. Components of Secure Boot Link kopierenLink in die Zwischenablage kopiert!
The Secure Boot mechanism consists of firmware, signature databases, cryptographic keys, boot loader, hardware modules, and the operating system. The following are the components of the UEFI trusted variables:
-
Key Exchange Key database (KEK): An exchange of public keys to establish trust between the RHEL operating system and the VM firmware. You can also update Allowed Signature database (
db
) and Forbidden Signature database (dbx
) by using these keys. - Platform Key database (PK): A self-signed single-key database to establish trust between the VM firmware and the cloud platform. The PK also updates the KEK database.
-
Allowed Signature database (
db
): A database that maintains a list of certificates or binary hashes to check whether the binary file can boot on the system. Additionally, all certificates fromdb
are imported to the.platform
keyring of the RHEL kernel. With this feature, you can add and load signed third party kernel modules in thelockdown
mode. -
Forbidden Signature database (
dbx
): A database that maintains a list of certificates or binary hashes that are not allowed to boot on the system.
Binary files check against the dbx
database and the Secure Boot Advanced Targeting (SBAT) mechanism. With SBAT, you can revoke older versions of specific binaries by keeping the certificate that has signed binaries as valid.
5.2.2. Stages of Secure Boot for RHEL on Cloud Link kopierenLink in die Zwischenablage kopiert!
When a RHEL instance boots in the Unified Kernel Image (UKI) mode and with Secure Boot enabled, the RHEL instance interacts with the cloud service infrastructure in the following sequence:
- Initialization: When a RHEL instance boots, the cloud-hosted firmware initially boots and implements the Secure Boot mechanism.
- Variable store initialization: The firmware initializes UEFI variables from a variable store, a dedicated storage area for information that firmware needs to manage for the boot process and runtime operations. When the RHEL instance boots for the first time, the store initializes from default values associated with the VM image.
Boot loader: When booted, the firmware loads the first stage boot loader. For the RHEL instance in a x86 UEFI environment, the first stage boot loader is shim. The shim boot loader authenticates and loads the next stage of the boot process and acts as a bridge between UEFI and GRUB.
-
The shim x86 binary in RHEL is currently signed by the
Microsoft Corporation UEFI CA 2011
Microsoft certificate so that the RHEL instance can boot in the Secure Boot enabled mode on various hardware and virtualized platforms where the Allowed Signature database (db
) has the default Microsoft certificates. -
The shim binary extends the list of trusted certificates with Red Hat Secure Boot CA and optionally, with Machine Owner Key (
MOK
).
-
The shim x86 binary in RHEL is currently signed by the
-
UKI: The shim binary loads the RHEL UKI (the
kernel-uki-virt
package). The corresponding certificate,Red Hat Secure Boot Signing 504
on the x86_64 architecture, signs the UKI. You can find this certificate in theredhat-sb-certs
package. Red Hat Secure Boot CA signs this certificate, so the check succeeds. -
UKI add-ons: When you use the UKI
cmdline
extensions, the RHEL kernel actively checks their signatures againstdb
,MOK
, and certificates shipped with shim. This process ensures that either the operating system vendor RHEL or a user has signed the extensions.
When the RHEL kernel boots in the Secure Boot mode, it enters lockdown
mode. After entering lockdown
, the RHEL kernel adds the db
keys to the .platform
keyring and the MOK
keys to the .machine
keyring. During the kernel build process, the build system works with an ephemeral key, which consists of private and public keys. The build system signs standard RHEL kernel modules, such as kernel-modules-core
, kernel-modules
, and kernel-modules-extra
. After the completion of each kernel build, the private key becomes obsolete to sign third-party modules. You can use certificates from db
and MOK
for this purpose.
5.3. Configuring a RHEL instance with Secure Boot on the AWS Marketplace Link kopierenLink in die Zwischenablage kopiert!
To ensure that your RHEL instance on AWS has secured operating system booting process, use Secure Boot. To configure a Red Hat Enterprise Linux (RHEL) instance with Secure Boot support on AWS, launch a RHEL Amazon Machine Image (AMI) from the AWS Marketplace, pre-configured with the uefi-preferred
enabled boot mode. The uefi-preferred
option enables support for the Unified Extensible Firmware Interface (UEFI) boot loader required for Secure Boot. Without UEFI, the Secure Boot feature does not work.
To avoid security issues, generate and keep private keys apart from the current RHEL instance. If Secure Boot secrets are stored on the same instance on which they are used, intruders can gain access to secrets for escalating their privileges. For more information on launching an AWS EC2 instance, see Get started with Amazon EC2.
Prerequisites
The RHEL AMI has the
uefi-preferred
option enabled in boot settings:aws ec2 describe-images --image-id <ami-099f85fc24d27c2a7> --region <us-east-2> | grep -E '"ImageId"|"Name"|"BootMode"'
$ aws ec2 describe-images --image-id <ami-099f85fc24d27c2a7> --region <us-east-2> | grep -E '"ImageId"|"Name"|"BootMode"' "ImageId": "ami-099f85fc24d27c2a7", "Name": "RHEL-10.0.0_HVM_GA-20250423-x86_64-0-Hourly2-GP3", "BootMode": "uefi-preferred"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow You have installed the following packages on the RHEL instance:
-
awscli2
-
python3
-
openssl
-
efivar
-
keyutils
-
edk2-ovmf
-
python3-virt-firmware
-
Procedure
Check the platform status of the RHEL Marketplace AMI instance:
mokutil --sb-state
$ mokutil --sb-state SecureBoot disabled Platform is in Setup Mode
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
setup
mode allows updating the Secure Boot UEFI variables within the instance.Create a new random universally unique identifier (UUID) and store it in a system-generated text file:
uuidgen --random > GUID.txt
$ uuidgen --random > GUID.txt
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Generate a new
PK.key
RSA private key and a self-signedPK.cer
X.509 certificate for the Platform Key database:Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
openssl
utility generates a common namePlatform key
for the certificate by setting output format to Distinguished Encoding Rules (DER).Generate a new
KEK.key
RSA private key and a self-signedKEK.cer
X.509 certificate for the Key Exchange Key database:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Generate a
custom_db.cer
custom certificate:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Download the
Microsoft Corporation UEFI CA 2011
Certificate:wget https://go.microsoft.com/fwlink/p/?linkid=321194 --user-agent="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36" -O MicCorUEFCA2011_2011-06-27.crt
$ wget https://go.microsoft.com/fwlink/p/?linkid=321194 --user-agent="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36" -O MicCorUEFCA2011_2011-06-27.crt
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Download the updated UEFI Revocation List File of forbidden signatures (
dbx
) for x64 bits system:wget https://uefi.org/sites/default/files/resources/x64_DBXUpdate.bin
$ wget https://uefi.org/sites/default/files/resources/x64_DBXUpdate.bin
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Generate UEFI variables file using the
virt-fw-vars
utility:Copy to Clipboard Copied! Toggle word wrap Toggle overflow For details, see the
virt-fw-vars(1)
man page on your system.Convert UEFI variables to the Extensible Firmware Interface (EFI) Signature List (ESL) format:
python3 /usr/share/doc/python3-virt-firmware/experimental/authfiles.py \ --input VARS \ --outdir . for f in PK KEK db dbx; do tail -c +41 $f.auth > $f.esl; done
$ python3 /usr/share/doc/python3-virt-firmware/experimental/authfiles.py \ --input VARS \ --outdir . $ for f in PK KEK db dbx; do tail -c +41 $f.auth > $f.esl; done
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteEach GUID is an assigned value and represents an EFI parameter
-
8be4df61-93ca-11d2-aa0d-00e098032b8c
:EFI_GLOBAL_VARIABLE_GUID
-
d719b2cb-3d3a-4596-a3bc-dad00e67656f
:EFI_IMAGE_SECURITY_DATABASE_GUID
The
EFI_GLOBAL_VARIABLE_GUID
parameter maintains settings of the bootable devices and boot managers, while theEFI_IMAGE_SECURITY_DATABASE_GUID
parameter represents the image security database for Secure Boot variablesdb
,dbx
, and storage of required keys and certificates.-
Transfer the database certificates to the target instance, use the
efivar
utility to manage UEFI environment variables.To transfer
PK.esl
, enter:efivar -w -n 8be4df61-93ca-11d2-aa0d-00e098032b8c-PK -f PK.esl
# efivar -w -n 8be4df61-93ca-11d2-aa0d-00e098032b8c-PK -f PK.esl
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To transfer
KEK.esl
, enter:efivar -w -n 8be4df61-93ca-11d2-aa0d-00e098032b8c-KEK -f KEK.esl
# efivar -w -n 8be4df61-93ca-11d2-aa0d-00e098032b8c-KEK -f KEK.esl
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To transfer
db.esl
, enter:efivar -w -n d719b2cb-3d3a-4596-a3bc-dad00e67656f-db -f db.esl
# efivar -w -n d719b2cb-3d3a-4596-a3bc-dad00e67656f-db -f db.esl
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To transfer the
dbx.esl
UEFI revocation list file for x64 architecture, enter:efivar -w -n d719b2cb-3d3a-4596-a3bc-dad00e67656f-dbx -f dbx.esl
# efivar -w -n d719b2cb-3d3a-4596-a3bc-dad00e67656f-dbx -f dbx.esl
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Reboot the instance from the AWS console.
Verification
Verify if Secure Boot is enabled:
mokutil --sb-state
$ mokutil --sb-state SecureBoot enabled
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
keyctl
utility to verify the kernel keyring for the custom certificate:sudo keyctl list %:.platform
$ sudo keyctl list %:.platform 4 keys in keyring: 786569360: ---lswrv 0 0 asymmetric: Signature Database key: 5856827178d376838611787277dc1d090c575759 ...
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.4. Configuring a RHEL instance with Secure Boot from a custom RHEL image Link kopierenLink in die Zwischenablage kopiert!
To ensure that your RHEL instance on AWS has a secured operating system booting process, use Secure Boot. When you register a custom RHEL Amazon machine images (AMI), the image consists of pre-stored Unified Extensible Firmware Interface (UEFI) variables for Secure Boot. This enables all the instances launched from the RHEL AMI to use the Secure Boot mechanism with the required variables on the first boot.
Prerequisites
- You have created and uploaded an AWS AMI image. For details, see Preparing and uploading AWS AMI.
You have installed the following packages:
-
awscli2
-
python3
-
openssl
-
efivar
-
keyutils
-
python3-virt-firmware
-
Procedure
Create a new random universally unique identifier (UUID) and store it in a system-generated text file:
uuidgen --random > GUID.txt
$ uuidgen --random > GUID.txt
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Generate a new RSA private key
PK.key
and a self-signed X.509 certificatePK.cer
for the platform key database:Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
openssl
utility generates a common name platform key for the certificate by setting output format to Distinguished Encoding Rules (DER).Generate a new RSA private key
KEK.key
and a self-signed X.509 certificateKEK.cer
for the Key Exchange Key database:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Generate a custom certificate
custom_db.cer
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Download the updated UEFI Revocation List File of forbidden signatures (
dbx
) for 64 bit system:wget https://uefi.org/sites/default/files/resources/x64_DBXUpdate.bin
$ wget https://uefi.org/sites/default/files/resources/x64_DBXUpdate.bin
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
virt-fw-vars
utility to generate theaws_blob.bin
binary file from keys, database certificates, and the UEFI variable store:Copy to Clipboard Copied! Toggle word wrap Toggle overflow The customized blob consists of:
-
PK.cer
with a self-signed X.509 certificate -
KEK.cer
andcustom_db.cer
with owner group GUID and Privacy Enhanced Mail (pem
) format -
x64_DBXUpdate.bin
list downloaded from database of excluded signatures (dbx
). -
The
77fa9abd-0359-4d32-bd60-28f4e78f784b
UUID is forMicCorUEFCA2011_2011-06-27.crt
Microsoft Corporation UEFI Certification Authority 2011.
-
Use the
awscli2
utility to create and register the AMI from a disk snapshot with the required Secure Boot variables:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Reboot the instance from the AWS Console.
Verification
Verify Secure Boot functionality:
mokutil --sb-state
$ mokutil --sb-state SecureBoot enabled
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Use the
keyctl
utility to verify the kernel keyring for the custom certificate:sudo keyctl list %:.platform
$ sudo keyctl list %:.platform 4 keys in keyring: 216534498: ---lswrv 0 0 asymmetric: Signature Database key: 5856827178d376838611787277dc1d090c575759 ...
Copy to Clipboard Copied! Toggle word wrap Toggle overflow