Deploying and managing RHEL on Google Cloud

Red Hat Enterprise Linux 10

Obtaining RHEL system images and creating RHEL instances on Google Cloud

Red Hat Customer Content Services

Legal Notice

Abstract

To use Red Hat Enterprise Linux (RHEL) on Google Cloud, you can create and deploy RHEL system images.

This also involves:

Registering, deploying, and provisioning RHEL images on Google Cloud
Managing networking configurations for RHEL Google Cloud instances
Configuring platform security and trusted execution technologies
Managing Red Hat High Availability (HA) clusters for RHEL instances

Providing feedback on Red Hat documentation
Copy link

We are committed to providing high-quality documentation and value your feedback. To help us improve, you can submit suggestions or report errors through the Red Hat Jira tracking system.

Procedure

Log in to the Jira website.
If you do not have an account, select the option to create one.
Click Create in the top navigation bar.
Enter a descriptive title in the Summary field.
Enter your suggestion for improvement in the Description field. Include links to the relevant parts of the documentation.
Click Create at the bottom of the window.

Chapter 1. Introducing RHEL on public cloud platforms
Copy link

Public cloud platforms offer computing resources as a service. Instead of using on-premise hardware, you can run your IT workloads, including Red Hat Enterprise Linux (RHEL) systems, as public cloud instances.

1.1. Benefits of using RHEL in a public cloud
Copy link

Running Red Hat Enterprise Linux (RHEL) on public cloud platforms provides flexible resource allocation, cost efficiency, and software-controlled configurations to optimize your infrastructure without managing physical hardware.

RHEL as a cloud instance located on a public cloud platform has the following benefits over RHEL on-premise physical systems or virtual machines (VMs):

Flexible and fine-grained allocation of resources

A cloud instance of RHEL runs as a VM on a cloud platform, which means a cluster of remote servers maintained by the cloud service provider. Therefore, on the software level, allocating hardware resources to the instance is easily customizable, such as a specific type of CPU or storage.

In comparison to a local RHEL system, you are also not limited by the capabilities of physical host. Instead, you can select from a variety of features, based on selections offered by the cloud provider.

Space and cost efficiency

You do not need to own any on-premise servers to host cloud workloads. This avoids the space, power, and maintenance requirements associated with physical hardware.

Instead, on public cloud platforms, you pay the cloud provider directly for using a cloud instance. The cost is typically based on the hardware allocated to the instance and the time to use it. Therefore, you can optimize your costs based on the requirements.

Software-controlled configurations

You save the entire configuration of a cloud instance as data on the cloud platform and control it with software. Therefore, you can easily create, remove, clone, or migrate the instance. You also operate a cloud instance remotely in a cloud provider console, and it connects to remote storage by default.

In addition, you can back up the current state of a cloud instance as a snapshot at any time. Afterwards, you can load the snapshot to restore the instance to the saved state.

Separation from the host and software compatibility

Similarly to a local VM, the RHEL guest operating system on a cloud instance runs on a Kernel-based Virtual Machine (KVM). This kernel is separate from the host operating system and from the client system that you use to connect to the instance.

Therefore, you can install any operating system on the cloud instance. This means that on a RHEL public cloud instance, you can run RHEL-specific applications not usable on your local operating system.

In addition, even if the operating system of the instance becomes unstable or compromised, it does not affect your client system.

1.2. Public cloud use cases for RHEL
Copy link

Deploying on a public cloud provides many benefits, but might not be the most efficient solution in every scenario. If you are evaluating whether to migrate your Red Hat Enterprise Linux (RHEL) deployments to the public cloud, consider whether your use case will benefit from the advantages of the public cloud.

Beneficial use cases

Deploying public cloud instances is very effective for flexibly increasing and decreasing the active computing power of your deployments. This is also known as scaling up and scaling down. You can use RHEL on public cloud in the following scenarios:
- Clusters with high peak workloads and low general performance requirements. Scaling up and down based on your demands can be highly efficient in terms of resource costs.
- Quickly setting up or expanding your clusters. This avoids high upfront costs of setting up local servers.
What happens in your local environment does not affect cloud instances. Therefore, you can use them for backup and disaster recovery.

Potentially problematic use cases

You are running an existing environment that you cannot adjust. Customizing a cloud instance to fit the specific needs of an existing deployment might not be economically efficient in comparison with your current host platform.
You are operating with a hard limit on your budget. Maintaining your deployment in a local data center typically provides less flexibility but more control over the maximum resource costs than the public cloud does.

1.3. Frequent concerns when migrating to a public cloud
Copy link

Moving your RHEL workloads from a local environment to a public cloud platform might raise concerns about the changes involved. The following are the most commonly asked questions:

Will my RHEL work differently as a cloud instance than as a local virtual machine?

In most respects, RHEL instances on a public cloud platform work the same as RHEL virtual machines on a local host, such as an on-premise server. Notable exceptions include:

Instead of private orchestration interfaces, public cloud instances use provider-specific console interfaces for managing your cloud resources.
Certain features, such as nested virtualization, might not work correctly. If a specific feature is critical for your deployment, check the feature’s compatibility in advance with your chosen public cloud provider.

Will my data stay safe in a public cloud as opposed to a local server?

The data in your RHEL cloud instances is in your ownership, and your public cloud provider does not have any access to it. In addition, major cloud providers support data encryption in transit, which improves the security of data when migrating your virtual machines to the public cloud.

In terms of security of RHEL public cloud instances, the following applies:

Your public cloud provider is responsible for the security of the cloud hypervisor
Red Hat provides the security features of the RHEL guest operating systems in your instances
You manage the specific security settings and practices in your cloud infrastructure

What effect does my geographic region have on the functionality of RHEL public cloud instances?

You can use RHEL instances on a public cloud platform regardless of your geographical location. Therefore, you can run your instances in the same region as your on-premises server. However, hosting your instances in a physically distant region might cause high latency when operating them. In addition, depending on the public cloud provider, certain regions might offer additional features or be more cost-efficient. Before creating your RHEL instances, review the properties of the hosting regions available for your chosen cloud provider.

1.4. Obtaining RHEL for public cloud deployments
Copy link

To deploy Red Hat Enterprise Linux (RHEL) in a public cloud environment, you must select a certified cloud service provider and create a RHEL cloud instance.

Based on your requirements and the current offerings in the market, select the optimal cloud service provider for your use case. The certified cloud service providers for running RHEL instances are Amazon Web Services (AWS), Google Cloud, and Microsoft Azure.
Note
This document specifically addresses the process of deploying RHEL on Google Cloud.
Create a RHEL cloud instance on your cloud platform.
Use Red Hat update infrastructure to keep your RHEL deployment up-to-date.

1.5. Methods for creating RHEL cloud instances
Copy link

You can create a Red Hat Enterprise Linux (RHEL) system image and import it to the cloud platform by using either RHEL image builder or purchasing a RHEL image directly from the cloud service provider marketplace. You can then deploy the RHEL image as a cloud instance.

To deploy a RHEL instance on a public cloud platform, you can use either of the following methods:

Create a RHEL system image and import it to the cloud platform

To create the system image, you can use the RHEL image builder or build the image manually.
This method uses your existing RHEL subscription. This is also referred to as bring your own subscription (BYOS).
You pre-pay a yearly subscription, and you can use your Red Hat customer discount.
Red Hat provides customer service.
For creating many images effectively, you can use the cloud-init utility.

Purchase a RHEL instance directly from the cloud provider marketplace

You post-pay an hourly rate for using the service. This method is also referred to as pay as you go (PAYG).
The cloud service provider provides customer service.

Chapter 2. Preparing and uploading custom GCE images to Google Cloud
Copy link

With RHEL image builder, you can build a gce image, provide credentials for your user or GCP service account, and then upload the gce image directly to the GCP environment.

2.1. Configuring and uploading a gce image to Google Cloud by using the CLI
Copy link

Set up a configuration file with credentials to upload your gce image to GCP by using the RHEL image builder CLI.

Warning

You cannot manually import gce image to Google Cloud, because the image does not boot. You must use either gcloud or RHEL image builder to upload it.

Prerequisites

You have a valid Google account and credentials to upload your image to Google Cloud. The credentials can be from a user account or a service account. The account associated with the credentials must have at least the following IAM roles assigned:
- roles/storage.admin - to create and delete storage objects
- roles/compute.storageAdmin - to import a VM image to Compute Engine.
You have an existing Google Cloud bucket.

Procedure

Use a text editor to create a gcp-config.toml configuration file with the following content:
```
provider = "gcp"
[settings]
bucket = "<gcp_bucket>"
region = "<gcp_storage_region>"
object = "<object_key>"
credentials = "<gcp_credentials>"
```
- <gcp_bucket> points to an existing bucket. This field stores the intermediate storage object of the image to be uploaded.
- <gcp_storage_region> is both a regular Google storage region and a dual or multi-region.
- <object_key> is the name of an intermediate storage object. It must not exist before the upload, and when the upload process finishes, it is deleted . If the object name does not end with .tar.gz, the object name automatically adds the extension.
- <gcp_credentials> is a Base64-encoded scheme of the credentials JSON file downloaded from Google Cloud. The credentials determine which project the Google Cloud uploads the image to.
  Note
  Specifying <gcp_credentials> in the gcp-config.toml file is optional if you use a different mechanism to authenticate with Google Cloud. For other authentication methods, see Authenticating with Google Cloud.
Retrieve the <gcp_credentials> from the JSON file downloaded from Google Cloud.
```
$ sudo base64 -w 0 cee-gcp-nasa-476a1fa485b7.json
```
Create a compose with an additional image name and cloud provider profile:
```
$ sudo image-builder build gce --blueprint <blueprint_name> <image_key> gcp-config.toml
```
The image build, upload, and cloud registration processes can take up to ten minutes to complete.

Verification

Verify that the image status is FINISHED:
```
$ sudo image-builder compose status
```

2.2. How RHEL image builder sorts the authentication order of different Google Cloud credentials
Copy link

You can use several different types of credentials with RHEL image builder to authenticate with GCP. If you set RHEL image builder configuration to authenticate with GCP by using multiple sets of credentials, it uses the credentials in an order of preference.

The order of preference is as follows:

Credentials specified with the image-builder command in the configuration file.
Application Default Credentials from the Google Cloud SDK library, which tries to automatically find a way to authenticate by using the following options:
1. If you set the GOOGLE_APPLICATION_CREDENTIALS environment variable, Application Default Credentials tries to load and use credentials from the file that the variable points to.
2. Application Default Credentials tries to authenticate by using the service account attached to the resource that is running the code. For example, Google Compute Engine VM.
  Note
  You must use the Google Cloud credentials to determine which Google Cloud project to upload the images to. To upload to different projects, specify the credentials in gcp-config.toml with the image-builder command.

Chapter 3. Deploying a RHEL image as a Google Compute Engine instance on Google Cloud
Copy link

You can deploy a Red Hat Enterprise Linux (RHEL) image as a Google Compute Engine (GCE) virtual machine (VM) by converting it to a GCE-compatible format and then deploying it. To deploy a RHEL image as a GCE instance, use RHEL image builder or manual configuration for the supported image.

You have created a Red Hat account.
You have signed up and set up a Google Cloud account.

3.1. Available RHEL image types for public cloud
Copy link

To deploy your RHEL virtual machine VM on a certified cloud service provider (CCSP), you can use several options. The following table lists the available image types, subscriptions, considerations, and sample scenarios for the image types.

Note

To deploy customized ISO images, you can use RHEL image builder. With RHEL image builder, you can create, upload, and deploy these custom images specific to your chosen CCSP.

Expand

Table 3.1. Image options
Image types	Subscriptions	Considerations	Sample scenario
Deploy a Red Hat gold image	Use your existing Red Hat subscriptions	The subscriptions include the Red Hat product cost and support for Cloud Access images, while you pay the CCSP for all other instance costs	Select a Red Hat gold image on the CCSP according to requirement.
Deploy a custom image that you move to the CCSP	Use your existing Red Hat subscriptions	The subscriptions includes the Red Hat product cost and support for custom RHEL image, while you pay the CCSP for all other instance costs	Upload your custom image and attach your subscriptions
Deploy an existing RHEL based custom machine image	The custom machine images include a RHEL image	You pay the CCSP on an hourly basis based on a pay-as-you-go model. For this model, on-demand images are available on the CCSP marketplace. The CCSP provides support for these images, while Red Hat handles updates. The CCSP provides updates through the Red Hat Update Infrastructure (RHUI)	Select a RHEL image when you launch an instance on the CCSP cloud management console, or choose an image from the CCSP marketplace.

Important

You cannot convert an on-demand instance to a custom RHEL instance. For migrating from an on-demand image to a custom RHEL bring your own subscription (BYOS) image, do the following:

Create a new custom RHEL instance, then migrate data from your on-demand instance.
When you complete data migration, terminate the on-demand instance to avoid additional billing.

3.2. Deploying a RHEL instance by using a custom base image
Copy link

To manually configure a virtual machine (VM), first create a base (starter) image. Then, you can modify configuration settings and add the packages the VM requires to operate on the cloud. You can also make additional configuration changes for your specific application after you upload the image.

Creating a VM from a base image has the following advantages:

Fully customizable
High flexibility for any use case
Lightweight - includes only the operating system and the required runtime libraries

To create a custom base image of RHEL from an ISO image, you can use the command line interface (CLI) or the web console for creating and configuring VM.

Note

Verify the following VM configurations.

SSH - Enable SSH to give remote access to your VM.
DHCP - Configure the primary virtual adapter to use DHCP.

Prerequisites

You have enabled virtualization on the host machine.
For web console, ensure the following options:
You have not checked the Immediately Start VM option.
You have already changed the Memory size to your preferred settings.
You have changed the Model option under Virtual Network Interface Settings to virtio and vCPUs to the capacity settings for the VM.

Procedure

Configure the Red Hat Enterprise Linux (RHEL) VM:
1. To install from the command line (CLI), ensure that you set the default memory, network interfaces, and CPUs according to your requirement for the VM. For details, see Creating virtual machines by using the command line
2. To install from the web console, see Creating virtual machines by using the web console
When the installation starts:
1. Create a root password.
2. Create an administrative user account.
After the installation completes, reboot the VM and log in to the root account.
After logging in as root, you can configure the image.
Register the VM and enable the RHEL repository:
```
# subscription-manager register
```

Verification

Verify if the system has the cloud-init package and enable it:

# dnf install cloud-init
# systemctl enable --now cloud-init.service

Power off the VM.

3.3. Uploading and running a RHEL instance on Google Cloud
Copy link

To run your RHEL instance on Google Cloud, you need to configure and upload a RHEL image to Google Cloud.

3.3.1. Installing the Google Cloud SDK
Copy link

You can use Google Cloud SDK to manage Google Cloud resources and services from your command line by using Google Cloud CLI.

Prerequisites

You have downloaded, extracted, and initialized the Google Cloud SDK.

Procedure

Use the gcloud CLI utility to manage project and instances.
Create a project:
```
# gcloud projects create <example_gcp_project_id> --name <example_gcp_project>
```
The example creates a project with the project ID <example_gcp_project_id> and the project name <example_gcp_project>.

Display project information:

# gcloud compute <example_project_info> describe --project <example_project_name>

3.3.2. Creating a new project on Google Cloud
Copy link

To upload your RHEL image to Google Cloud, You need to create new project on Google Cloud. A project manages your assigned Google Cloud resources.

Prerequisites

You have an account on Google Cloud.

Procedure

Launch the Google Cloud Console.
Click the drop-down menu to the right of Google Cloud.
From the pop-up menu, click NEW PROJECT.
From the New Project window, enter a name for your new project.
Check Organization. Click the drop-down menu to change the organization, if necessary.
Confirm the Location of your parent organization or folder. Click Browse to search for and change this value, if necessary.
Click CREATE to create your new Google Cloud project.

3.3.3. Creating and uploading a RHEL image on Google Cloud Storage
Copy link

To import and store objects such as virtual machine (VM) images, you can create a Google Cloud storage bucket.

Procedure

Log in to the Google Cloud console:
```
# gcloud auth login
```
Create a storage bucket:
```
# gsutil mb gs://<example_bucket_name>
```
Note
Alternatively, you can use the Google Cloud Console to create a bucket. For details, see Create a bucket.
Specify the image name that you want to create, the existing bucket name, and the name of the image:
```
# *gcloud compute images create my-image-name --source-uri gs://__<example_bucket_name>__/disk.raw.tar.gz*
```
Note
Alternatively, you can use the Google Cloud Console to create an image. See Creating, deleting, and deprecating custom images for more information.
Optional: Find the image in the Google Cloud Console.
1. Click the Navigation menu to the left of the Google Cloud Console banner.
2. Select Compute Engine and then Images.

Run the qemu-img command to convert your qcow2 image to the raw format:

# qemu-img convert -f qcow2 -O raw rhel-10.0-sample.qcow2 disk.raw

Compress the image:

# tar --format=oldgnu -Sczf disk.raw.tar.gz disk.raw

Upload the image to the existing bucket:
```
# gsutil cp disk.raw.tar.gz gs://<example_bucket_name>
```
Note
Upload could take a few minutes.
From the Google Cloud home screen, click the collapsed menu icon and select Storage and then select Browser.
Click the name of your bucket where disk.raw.tar.gz is now listed.
Note
You can also upload your image by using the Google Cloud Console. To do so, click the bucket name and then click Upload files.

3.3.4. Launching and connecting to a RHEL Google Compute Engine instance
Copy link

You can configure a Google Compute Engine (GCE) virtual machine (VM) instance from an image by using the Google Cloud Console.

Procedure

From the Google Cloud Console Dashboard page, click the Navigation menu to the left of the Google Cloud Console banner and select Compute Engine and then select Images.
Select your image.
Click Create Instance.
On the Create an instance page, enter a Name for your instance.
Choose a Region and Zone.
Choose a Machine configuration that meets or exceeds the requirements of your workload.
Ensure that Boot disk specifies the name of your image.
Optional: Under Firewall, select Allow HTTP traffic or Allow HTTPS traffic.
Click Create.
Find your image under VM instances.
Click the Navigation menu to the left of the Google Cloud Console banner and select Compute Engine and then select VM instances.
Note
Alternatively, you can use the gcloud compute instances create command to create a GCE VM instance from an image.
# gcloud compute instances create myinstance3 --zone=us-central1-a --image test-iso2-image
The example creates a VM instance named myinstance3 in zone us-central1-a based upon the existing image test-iso2-image. For details, see gcloud compute instances create.
Use the ssh-keygen utility to generate an SSH key pair to use with GCE by using the public IP address:
```
# ssh-keygen -t rsa -f ~/.ssh/google_compute_engine.pub
```
From the Google Cloud Console Dashboard page, click the Navigation menu to the left of the Google Cloud Console banner and select Compute Engine and then select Metadata.
Click SSH Keys and then click Edit.
Enter the output generated from the ~/.ssh/google_compute_engine.pub file and click Save.
Connect to the instance:
```
# ssh -i ~/.ssh/google_compute_engine <username>@<instance_external_ip>
```
Note
Or, you can run the gcloud compute config-ssh command to populate the config file with aliases for instances. The aliases allow simple SSH connections by instance name. For details, see gcloud compute config-ssh.

3.4. Attaching Red Hat subscriptions
Copy link

To register and attach your Red Hat subscription to a RHEL instance, you can use the subscription-manager command.

Prerequisites

You have an active Red Hat account.

Procedure

Register your system:
```
# subscription-manager register
```
Attach your subscriptions:
- You can use an activation key to attach subscriptions. See Creating Red Hat Customer Portal Activation Keys for more information.
- Otherwise, you can manually attach a subscription by using the ID of the subscription pool (Pool ID). See Attaching a host-based subscription to hypervisors.
Optional: To collect various system metrics about the instance in the Red Hat Hybrid Cloud Console, you can register the instance with Red Hat Lightspeed.
```
# insights-client register --display-name <display_name_value>
```
For information about further configuration of Red Hat Lightspeed, see Client Configuration Guide for Red Hat Lightspeed.

3.5. Setting up automatic registration on Google Cloud gold images
Copy link

You can deploy Red Hat Enterprise Linux (RHEL) virtual machines (VMs) more efficiently on Google Cloud by using gold images of RHEL. This ensures that the VMs are automatically registered to the Red Hat Subscription Manager (RHSM).

Prerequisites

RHEL gold images are available to you in Google Cloud. For instructions, see Using gold images on Google Cloud.
Note
A Google Cloud account can only be attached to a single Red Hat account at a time. Therefore, ensure no other users require access to the Google Cloud account before attaching it to your Red Hat account.

Procedure

Use the gold image to create a RHEL VM in your Google Cloud instance. For instructions, see Uploading and running a RHEL instance on Google Cloud.
If your RHSM settings are correct, the VM will be automatically subscribed to RHSM.

Verification

In a RHEL VM created by using the above instructions, verify the system is registered to RHSM. On a successfully registered system, the`subscription-manager identity` command displays the UUID of the system. For example:
```
# subscription-manager identity
system identity: fdc46662-c536-43fb-a18a-bbcb283102b7
name: 192.168.122.222
org name: 6340056
org ID: 6340056
```

Chapter 4. Configuring a Red Hat High Availability Cluster on Google Cloud
Copy link

To group RHEL nodes on Google Cloud and automatically redistribute workloads if a node fails, you can configure a high availability (HA) cluster on Google Cloud. The process for setting up HA clusters on Google Cloud is comparable to configuring them in traditional, non-cloud environments.

To configure a Red Hat HA cluster on Azure that uses virtual hard drive (VHD) instances as cluster nodes, you need to obtain RHEL images for the cluster.

Before you begin, make sure to complete following steps:

You have created a Red Hat account.
You have signed up and set up a Google Cloud account.
Red Hat Enterprise Linux 10 Server: rhel-10-server-rpms/8Server/x86_64
Red Hat Enterprise Linux 10 Server (High Availability): rhel-10-server-ha-rpms/8Server/x86_64
Your project should have a service account that belongs to a VM instance and not an individual user.
You can use the default service account instead of creating a separate service account.

If you or your project administrator create a custom service account, configure the service account for the following roles:

Cloud Trace Agent
Compute Admin
Compute Network Admin
Cloud Datastore User
Logging Admin
Monitoring Editor
Monitoring Metric Writer
Service Account Administrator
Storage Admin

4.1. Benefits of using high-availability clusters on public cloud platforms
Copy link

A high-availability (HA) cluster links a set of computers (called nodes) to run a specific workload. HA clusters offer redundancy to handle hardware or software failures. When a node in the HA cluster fails, the Pacemaker cluster resource manager quickly distributes the workload to other nodes, ensuring that services on the cluster continue without noticeable downtime.

You can also run HA clusters on public cloud platforms. In this case, you would use virtual machine (VM) instances in the cloud as the individual cluster nodes. Using HA clusters on a public cloud platform has the following benefits:

Improved availability: In case of a VM failure, the workload is quickly redistributed to other nodes, so running services are not disrupted.
Scalability: You can start additional nodes when demand is high and stop them when demand is low.
Cost-effectiveness: With the pay-as-you-go pricing, you pay only for nodes that are running.
Simplified management: Some public cloud platforms offer management interfaces to make configuring HA clusters easier.

To enable HA on your RHEL systems, Red Hat offers a HA Add-On. You can configure a RHEL cluster with Red Hat HA Add-On to manage HA clusters with groups of RHEL servers. Red Hat HA Add-On gives access to integrated and streamlined tools. With cluster resource manager, fencing agents, and resource agents, you can set up and configure the cluster for automation. The Red Hat HA Add-On offers the following components for automation:

Pacemaker, a cluster resource manager that offers both a command line utility (pcs) and a GUI (pcsd) to support many nodes
Corosync and Kronosnet to create and manage HA clusters
Resource agents to configure and manage custom applications
Fencing agents to use cluster on platforms such as bare-metal servers and virtual machines

The Red Hat HA Add-On handles critical tasks such as node failures, load balancing, and node health checks for fault tolerance and system reliability.

4.2. Required system packages
Copy link

To create and configure a base image of RHEL, your host system must have the following packages installed.

Expand

Table 4.1. System packages
Package	Repository	Description
libvirt	`rhel-10-for-x86_64-appstream-rpms`	Open source API, daemon, and management tool for managing platform virtualization
virt-install	`rhel-10-for-x86_64-appstream-rpms`	A command-line utility for building VMs
libguestfs	`rhel-10-for-x86_64-appstream-rpms`	A library for accessing and modifying VM file systems
guestfs-tools	`rhel-10-for-x86_64-appstream-rpms`	System administration tools for VMs; includes the `virt-customize` utility

By confirming installation of the mentioned system packages, you can now follow the deployment steps in Deploying a RHEL instance by using a custom base image

4.3. Creating a custom virtual private cloud network and subnet
Copy link

You must configure a custom virtual private cloud (VPC) network and subnet to enable High Availability (HA) for your cluster.

Prerequisites

You have installed the Google Cloud SDK.
You have created a new Google Cloud project.

Procedure

Launch the Google Cloud Console.
Select VPC networks under Networking in the left navigation pane.
Click Create VPC Network.
Enter a name for the VPC network.
Under the New subnet, create a Custom subnet in the region where you want to create the cluster.
Click Create.

4.4. Creating and configuring a base Google Cloud instance
Copy link

Create and configure a Google Cloud instance that complies with the platform’s operating and security standards. Use this foundational instance as a starting point to deploy applications, apply additional configurations, or set up clusters as needed.

Procedure

Create an image from the compressed file in the bucket.

$ gcloud compute images create BaseImageName --source-uri gs://BucketName/BaseImageName.tar.gz

Example:

$ gcloud compute images create rhel-76-server --source-uri gs://user-rhelha/rhel-server-76.tar.gz
Created [https://www.googleapis.com/compute/v1/projects/MyProject/global/images/rhel-server-76].
NAME            PROJECT                 FAMILY  DEPRECATED  STATUS
rhel-76-server  rhel-ha-testing-on-gcp                      READY

Create a template instance from the image. The minimum size required for a base RHEL instance is n1-standard-2. See gcloud compute instances create for additional configuration options.

$ gcloud compute instances create BaseInstanceName --can-ip-forward --machine-type n1-standard-2 --image BaseImageName --service-account ServiceAccountEmail

Example:

$ gcloud compute instances create rhel-76-server-base-instance --can-ip-forward --machine-type n1-standard-2 --image rhel-76-server --service-account account@project-name-on-gcp.iam.gserviceaccount.com
Created [https://www.googleapis.com/compute/v1/projects/rhel-ha-testing-on-gcp/zones/us-east1-b/instances/rhel-76-server-base-instance].
NAME   ZONE   MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP     STATUS
rhel-76-server-base-instance  us-east1-bn1-standard-2          10.10.10.3   192.227.54.211  RUNNING

Connect to the instance with an SSH terminal session.
```
$ ssh root@PublicIPaddress
```
Update the RHEL software.
1. Register with Red Hat Subscription Manager (RHSM).
2. Enable a Subscription Pool ID.
3. Disable all repositories.
  # subscription-manager repos --disable=
4. Enable the following repository.
  # subscription-manager repos --enable=rhel-10-server-rpms
5. Run the dnf update command.
  # dnf update -y
Install the Google Cloud Linux Guest Environment on the running instance (in-place installation).
See Install the guest environment in-place for instructions.
Select the CentOS/RHEL option.
Copy the command script and paste it to the terminal to run the script immediately.
Make the following configuration changes to the instance. These changes are based on Google Cloud recommendations for custom images. See gcloudcompute images list for more information.
1. Edit the /etc/chrony.conf file and remove all NTP servers.
2. Add the following NTP server.
  metadata.google.internal iburst Google NTP server
3. Remove any persistent network device rules.
  # rm -f /etc/udev/rules.d/70-persistent-net.rules # rm -f /etc/udev/rules.d/75-persistent-net-generator.rules
4. Set the network service to start automatically.
  # chkconfig network on
5. Set the sshd service to start automatically.
  # systemctl enable sshd # systemctl is-enabled sshd
6. Set the time zone to UTC.
  # ln -sf /usr/share/zoneinfo/UTC /etc/localtime
7. Optional: Edit the /etc/ssh/ssh_config file and add the following lines to the end of the file. This keeps your SSH session active during longer periods of inactivity.
  # Server times out connections after several minutes of inactivity. # Keep alive ssh connections by sending a packet every 7 minutes. ServerAliveInterval 420
8. Edit the /etc/ssh/sshd_config file and make the following changes, if necessary. The ClientAliveInterval 420 setting is optional; this keeps your SSH session active during longer periods of inactivity.
  PermitRootLogin no PasswordAuthentication no AllowTcpForwarding yes X11Forwarding no PermitTunnel no # Compute times out connections after 10 minutes of inactivity. # Keep ssh connections alive by sending a packet every 7 minutes. ClientAliveInterval 420
Disable password access.
```
ssh_pwauth from 1 to 0.
ssh_pwauth: 0
```
Important
Previously, you enabled password access to allow SSH session access to configure the instance. You must disable password access. All SSH session access must be without password.
Unregister the instance from the subscription manager.
```
# subscription-manager unregister
```
Clean the shell history. Keep the instance running for the next procedure.
```
# export HISTSIZE=0
```

4.5. Creating a snapshot image
Copy link

To preserve the configuration and disk data of a Google Cloud high availability (HA) instance, you can create a snapshot of the instance. A snapshot provides a point-in-time backup of your disk, allowing you to restore your system or create new instances with the same configuration and disk data.

Procedure

On the running instance, synchronize data to disk.
```
# sync
```

On your host system, create the snapshot.

$ gcloud compute disks snapshot InstanceName --snapshot-names SnapshotName

On your host system, create the configured image from the snapshot.

$ gcloud compute images create ConfiguredImageFromSnapshot --source-snapshot SnapshotName

4.6. Creating an HA node template instance and HA nodes
Copy link

You can create a node template from a configured snapshot image, then use this template to create all HA nodes.

Procedure

Create an instance template:

$ gcloud compute instance-templates create InstanceTemplateName --can-ip-forward --machine-type n1-standard-2 --image ConfiguredImageFromSnapshot --service-account ServiceAccountEmailAddress

Example:

$ gcloud compute instance-templates create rhel-10-instance-template --can-ip-forward --machine-type n1-standard-2 --image rhel-10-gcp-image --service-account account@project-name-on-gcp.iam.gserviceaccount.com
Created [https://www.googleapis.com/compute/v1/projects/project-name-on-gcp/global/instanceTemplates/rhel-10-instance-template].
NAME  MACHINE_TYPE   PREEMPTIBLE  CREATION_TIMESTAMP
rhel-101-instance-template   n1-standard-2          2018-07-25T11:09:30.506-07:00

Create multiple nodes in one zone:

# gcloud compute instances create NodeName01 NodeName02 --source-instance-template InstanceTemplateName --zone RegionZone --network=NetworkName --subnet=SubnetName

Example:

$ gcloud compute instances create rhel10-node-01 rhel10-node-02 rhel10-node-03 --source-instance-template rhel-10-instance-template --zone us-west1-b --network=projectVPC --subnet=range0
Created [https://www.googleapis.com/compute/v1/projects/project-name-on-gcp/zones/us-west1-b/instances/rhel81-node-01].
Created [https://www.googleapis.com/compute/v1/projects/project-name-on-gcp/zones/us-west1-b/instances/rhel81-node-02].
Created [https://www.googleapis.com/compute/v1/projects/project-name-on-gcp/zones/us-west1-b/instances/rhel81-node-03].
NAME            ZONE        MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
rhel81-node-01  us-west1-b  n1-standard-2               10.10.10.4   192.230.25.81   RUNNING
rhel81-node-02  us-west1-b  n1-standard-2               10.10.10.5   192.230.81.253  RUNNING
rhel81-node-03  us-east1-b  n1-standard-2               10.10.10.6   192.230.102.15  RUNNING

4.7. Installing the high availability packages and agents for Google Cloud
Copy link

To configure a Red Hat High Availability (HA) cluster on Google Cloud, you must install the High Availability packages and agents on each node.

Procedure

In the Google Cloud Console, select Compute Engine and then select VM instances.
Select the instance, click the arrow next to SSH, and select the View gcloud command option.
Paste this command at the terminal to access the instance without password.
Enable sudo account access and register with Red Hat Subscription Manager.
Enable a Subscription Pool ID.

Disable all repositories.

# subscription-manager repos --disable= *

Enable the following repositories.

# subscription-manager repos --enable=rhel-10-server-rpms
# subscription-manager repos --enable=rhel-10-for-x86_64-highavailability-rpms

Install pcs pacemaker, the fence agents, and the resource agents.

# dnf install -y pcs pacemaker fence-agents-gce resource-agents-cloud

Update all packages.
```
# dnf update -y
```

4.8. Creating a high availability cluster
Copy link

You can create a Red Hat High Availability Add-On cluster. This example uses nodes z1.example.com and z2.example.com.

Note

To display the parameters of a pcs command and a description of those parameters, use the -h option of the pcs command.

Prerequisites

You have created a Red Hat account

Procedure

Authenticate the pcs user hacluster for each node in the cluster on the node from which you will be running pcs.
The following command authenticates user hacluster on z1.example.com for both of the nodes in a two-node cluster that will consist of z1.example.com and z2.example.com.
```
[root@z1 ~]# pcs host auth z1.example.com z2.example.com
Username: hacluster
Password:
z1.example.com: Authorized
z2.example.com: Authorized
```
Run the following command from z1.example.com to create the two-node cluster my_cluster that consists of nodes z1.example.com and z2.example.com. This will propagate the cluster configuration files to both nodes in the cluster. This command includes the --start option, which will start the cluster services on both nodes in the cluster.
```
[root@z1 ~]# pcs cluster setup my_cluster --start z1.example.com z2.example.com
```
Enable the cluster services to run on each node in the cluster when the node is booted.
Note
For your particular environment, you can skip this step by keeping the cluster services disabled. If enabled and a node goes down, any issues with your cluster or your resources are resolved before the node rejoins the cluster. If you keep the cluster services disabled, you need to manually start the services when you reboot a node by executing the pcs cluster start command on that node.
```
[root@z1 ~]# pcs cluster enable --all
```
Display the status of the cluster you created with the pcs cluster status command. Because there could be a slight delay before the cluster is up and running when you start the cluster services with the --start option of the pcs cluster setup command, you should ensure that the cluster is up and running before performing any subsequent actions on the cluster and its configuration.
```
[root@z1 ~]# pcs cluster status
Cluster Status:
 Stack: corosync
 Current DC: z2.example.com (version 2.0.0-10.el8-b67d8d0de9) - partition with quorum
 Last updated: Thu Oct 11 16:11:18 2018
 Last change: Thu Oct 11 16:11:00 2018 by hacluster via crmd on z2.example.com
 2 Nodes configured
 0 Resources configured

...
```

4.9. Configuring fencing in a Red Hat High Availability cluster
Copy link

When a node becomes unresponsive, the cluster must isolate it to prevent data corruption. Since the node cannot be contacted directly, you must configure fencing. An external fence device cuts off the node’s access to shared resources or performs a hard reboot.

Without a fence device configured you do not have a way to know that the resources previously used by the disconnected cluster node have been released, and this could prevent the services from running on any of the other cluster nodes. Conversely, the system may assume erroneously that the cluster node has released its resources and this can lead to data corruption and data loss. Without a fence device configured data integrity cannot be guaranteed and the cluster configuration will be unsupported.

When the fencing is in progress no other cluster operation is allowed to run. Normal operation of the cluster cannot resume until fencing has completed or the cluster node rejoins the cluster after the cluster node has been rebooted.

4.9.1. Displaying available fence agents and their options
Copy link

You can view available fencing agents and the available options for specific fencing agents.

Note

Your system’s hardware determines the type of fencing device to use for your cluster. For information about supported platforms and architectures and the different fencing devices, see the Red Hat Knowledgebase article Cluster Platforms and Architectures section of the article Support Policies for RHEL High Availability Clusters.

Run the following command to list all available fencing agents. When you specify a filter, this command displays only the fencing agents that match the filter.

# pcs stonith list [filter]

Run the following command to display the options for the specified fencing agent.

# pcs stonith describe [stonith_agent]

For example, the following command displays the options for the fence agent for APC over telnet/SSH.

# pcs stonith describe fence_apc
Stonith options for: fence_apc
  ipaddr (required): IP Address or Hostname
  login (required): Login Name
  passwd: Login password or passphrase
  passwd_script: Script to retrieve password
  cmd_prompt: Force command prompt
  secure: SSH connection
  port (required): Physical plug number or name of virtual machine
  identity_file: Identity file for ssh
  switch: Physical switch number on device
  inet4_only: Forces agent to use IPv4 addresses only
  inet6_only: Forces agent to use IPv6 addresses only
  ipport: TCP port to use for connection with device
  action (required): Fencing Action
  verbose: Verbose mode
  debug: Write debug information to given file
  version: Display version information and exit
  help: Display help and exit
  separator: Separator for CSV created by operation list
  power_timeout: Test X seconds for status change after ON/OFF
  shell_timeout: Wait X seconds for cmd prompt after issuing command
  login_timeout: Wait X seconds for cmd prompt after login
  power_wait: Wait X seconds after issuing ON/OFF
  delay: Wait X seconds before fencing is started
  retry_on: Count of attempts to retry power on

Warning

For fence agents that provide a method option, with the exception of the fence_sbd agent a value of cycle is unsupported and should not be specified, as it may cause data corruption. Even for fence_sbd, however. you should not specify a method and instead use the default value.

4.9.2. Creating a fence device
Copy link

Create a fence device using the pcs stonith create command. To view all available creation options, use the pcs stonith -h command.

Procedure

Create a fence device:

# pcs stonith create stonith_id stonith_device_type [stonith_device_options] [op operation_action operation_options]

The following command creates a single fencing device for a single node:
```
# pcs stonith create MyStonith fence_virt pcmk_host_list=f1 op monitor interval=30s
```
Some fence devices can fence only a single node, while other devices can fence multiple nodes. The parameters you specify when you create a fencing device depend on what your fencing device supports and requires.
Some fence devices can automatically determine what nodes they can fence.
You can use the pcmk_host_list parameter when creating a fencing device to specify all of the machines that are controlled by that fencing device.
Some fence devices require a mapping of host names to the specifications that the fence device understands. You can map host names with the pcmk_host_map parameter when creating a fencing device.

4.9.3. General properties of fencing devices
Copy link

Configure fencing behavior using device-specific options and cluster-wide properties. Device options define agent settings, such as IP addresses, and metadata like delays. Cluster properties manage global logic, including timeouts and the stonith-enabled parameter.

Any cluster node can fence any other cluster node with any fence device, regardless of whether the fence resource is started or stopped. Whether the resource is started controls only the recurring monitor for the device, not whether it can be used, with the following exceptions:

You can disable a fencing device by running the pcs stonith disable stonith_id command. This will prevent any node from using that device.
To prevent a specific node from using a fencing device, you can configure location constraints for the fencing resource with the pcs constraint location … avoids command.
Configuring stonith-enabled=false will disable fencing altogether. Note, however, that Red Hat does not support clusters when fencing is disabled, as it is not suitable for a production environment.

The following table describes the general properties you can set for fencing devices.

Expand

Table 4.2. General Properties of Fencing Devices
Field	Type	Default	Description
`pcmk_host_map`	string		A mapping of host names to port numbers for devices that do not support host names. For example: `node1:1;node2:2,3` tells the cluster to use port 1 for node1 and ports 2 and 3 for node2. The `pcmk_host_map` property supports special characters inside `pcmk_host_map` values using a backslash in front of the value. For example, you can specify `pcmk_host_map="node3:plug\ 1"` to include a space in the host alias.
`pcmk_host_list`	string		A list of machines controlled by this device (Optional unless `pcmk_host_check=static-list`).
`pcmk_host_check`	string	* `static-list` if either `pcmk_host_list` or `pcmk_host_map` is set * Otherwise, `dynamic-list` if the fence device supports the `list` action * Otherwise, `status` if the fence device supports the `status` action *Otherwise, `none`.	How to determine which machines are controlled by the device. Allowed values: `dynamic-list` (query the device), `static-list` (check the `pcmk_host_list` attribute), none (assume every device can fence every machine)

The following table summarizes additional properties you can set for fencing devices. Note that these properties are for advanced use only.

Expand

Table 4.3. Advanced Properties of Fencing Devices
Field	Type	Default	Description
`pcmk_host_argument`	string	port	An alternate parameter to supply instead of port. Some devices do not support the standard port parameter or may provide additional ones. Use this to specify an alternate, device-specific parameter that should indicate the machine to be fenced. A value of `none` can be used to tell the cluster not to supply any additional parameters.
`pcmk_reboot_action`	string	reboot	An alternate command to run instead of `reboot`. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the reboot action.
`pcmk_reboot_timeout`	time	60s	Specify an alternate timeout to use for reboot actions instead of `stonith-timeout`. Some devices need much more/less time to complete than normal. Use this to specify an alternate, device-specific, timeout for reboot actions.
`pcmk_reboot_retries`	integer	2	The maximum number of times to retry the `reboot` command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries reboot actions before giving up.
`pcmk_off_action`	string	off	An alternate command to run instead of `off`. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the off action.
`pcmk_off_timeout`	time	60s	Specify an alternate timeout to use for off actions instead of `stonith-timeout`. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for off actions.
`pcmk_off_retries`	integer	2	The maximum number of times to retry the off command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries off actions before giving up.
`pcmk_list_action`	string	list	An alternate command to run instead of `list`. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the list action.
`pcmk_list_timeout`	time	60s	Specify an alternate timeout to use for list actions. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for list actions.
`pcmk_list_retries`	integer	2	The maximum number of times to retry the `list` command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries list actions before giving up.
`pcmk_monitor_action`	string	monitor	An alternate command to run instead of `monitor`. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the monitor action.
`pcmk_monitor_timeout`	time	60s	Specify an alternate timeout to use for monitor actions instead of `stonith-timeout`. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for monitor actions.
`pcmk_monitor_retries`	integer	2	The maximum number of times to retry the `monitor` command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries monitor actions before giving up.
`pcmk_status_action`	string	status	An alternate command to run instead of `status`. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the status action.
`pcmk_status_timeout`	time	60s	Specify an alternate timeout to use for status actions instead of `stonith-timeout`. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for status actions.
`pcmk_status_retries`	integer	2	The maximum number of times to retry the status command within the timeout period. Some devices do not support multiple connections. Operations may fail if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries status actions before giving up.
`pcmk_delay_base`	string	0s	Enables a base delay for fencing actions and specifies a base delay value. You can specify different values for different nodes with the `pcmk_delay_base` parameter. For general information about fencing delay parameters and their interactions, see Fencing delays.
`pcmk_delay_max`	time	0s	Enables a random delay for fencing actions and specifies the maximum delay, which is the maximum value of the combined base delay and random delay. For example, if the base delay is 3 and `pcmk_delay_max` is 10, the random delay will be between 3 and 10. For general information about fencing delay parameters and their interactions, see Fencing delays.
`pcmk_action_limit`	integer	1	The maximum number of actions that can be performed in parallel on this device. The cluster property `concurrent-fencing=true` needs to be configured first (this is the default value). A value of -1 is unlimited.
`pcmk_on_action`	string	on	For advanced use only: An alternate command to run instead of `on`. Some devices do not support the standard commands or may provide additional ones. Use this to specify an alternate, device-specific, command that implements the `on` action.
`pcmk_on_timeout`	time	60s	For advanced use only: Specify an alternate timeout to use for `on` actions instead of `stonith-timeout`. Some devices need much more or much less time to complete than normal. Use this to specify an alternate, device-specific, timeout for `on` actions.
`pcmk_on_retries`	integer	2	For advanced use only: The maximum number of times to retry the `on` command within the timeout period. Some devices do not support multiple connections. Operations may `fail` if the device is busy with another task so Pacemaker will automatically retry the operation, if there is time remaining. Use this option to alter the number of times Pacemaker retries `on` actions before giving up.

In addition to the properties you can set for individual fence devices, there are also cluster properties you can set that determine fencing behavior, as described in the following table.

Expand

Table 4.4. Cluster Properties that Determine Fencing Behavior
Option	Default	Description
`stonith-enabled`	true	Indicates that failed nodes and nodes with resources that cannot be stopped should be fenced. Protecting your data requires that you set this `true`. If `true`, or unset, the cluster will refuse to start resources unless one or more STONITH resources have been configured also. Red Hat only supports clusters with this value set to `true`.
`stonith-action`	reboot	Action to send to fencing device. Allowed values: `reboot`, `off`. The value `poweroff` is also allowed, but is only used for legacy devices.
`stonith-timeout`	60s	How long to wait for a STONITH action to complete.
`stonith-max-attempts`	10	How many times fencing can fail for a target before the cluster will no longer immediately re-attempt it.
`stonith-watchdog-timeout`		The maximum time to wait until a node can be assumed to have been killed by the hardware watchdog. It is recommended that this value be set to twice the value of the hardware watchdog timeout. This option is needed only if watchdog-only SBD configuration is used for fencing.
`concurrent-fencing`	true	Allow fencing operations to be performed in parallel.
`fence-reaction`	stop	Determines how a cluster node should react if notified of its own fencing. A cluster node may receive notification of its own fencing if fencing is misconfigured, or if fabric fencing is in use that does not cut cluster communication. Allowed values are `stop` to attempt to immediately stop Pacemaker and stay stopped, or `panic` to attempt to immediately reboot the local node, falling back to stop on failure. Although the default value for this property is `stop`, the safest choice for this value is `panic`, which attempts to immediately reboot the local node. If you prefer the stop behavior, as is most likely to be the case in conjunction with fabric fencing, it is recommended that you set this explicitly.
`priority-fencing-delay`	0 (disabled)	Sets a fencing delay that allows you to configure a two-node cluster so that in a split-brain situation the node with the fewest or least important resources running is the node that gets fenced. For general information about fencing delay parameters and their interactions, see Fencing delays.

For information about setting cluster properties, see https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/10/html/configuring_and_managing_high_availability_clusters/index#setting-cluster-properties

4.9.4. Fencing delays
Copy link

In two-node clusters, simultaneous communication loss can cause nodes to fence each other, shutting down the entire cluster. Configure a fencing delay to prevent this race condition. Delays are unnecessary in larger clusters, where quorum determines fencing authority.

You can set different types of fencing delays, depending on your system requirements.

static fencing delays
A static fencing delay is a fixed, predetermined delay. Setting a static delay on one node makes that node more likely to be fenced because it increases the chances that the other node will initiate fencing first after detecting lost communication. In an active/passive cluster, setting a delay on a passive node makes it more likely that the passive node will be fenced when communication breaks down. You configure a static delay by using the pcs_delay_base cluster property. You can set this property when a separate fence device is used for each node or when a single fence device is used for all nodes.
dynamic fencing delays
A dynamic fencing delay is random. It can vary and is determined at the time fencing is needed. You configure a random delay and specify a maximum value for the combined base delay and random delay with the pcs_delay_max cluster property. When the fencing delay for each node is random, which node is fenced is also random. You may find this feature useful if your cluster is configured with a single fence device for all nodes in an active/active design.
priority fencing delays
A priority fencing delay is based on active resource priorities. If all resources have the same priority, the node with the fewest resources running is the node that gets fenced. In most cases, you use only one delay-related parameter, but it is possible to combine them. Combining delay-related parameters adds the priority values for the resources together to create a total delay. You configure a priority fencing delay with the priority-fencing-delay cluster property. You may find this feature useful in an active/active cluster design because it can make the node running the fewest resources more likely to be fenced when communication between the nodes is lost.

The pcmk_delay_base cluster property

Setting the pcmk_delay_base cluster property enables a base delay for fencing and specifies a base delay value.

When you set the pcmk_delay_max cluster property in addition to the pcmk_delay_base property, the overall delay is derived from a random delay value added to this static delay so that the sum is kept below the maximum delay. When you set pcmk_delay_base but do not set pcmk_delay_max, there is no random component to the delay and it will be the value of pcmk_delay_base.

You can specify different values for different nodes with the pcmk_delay_base parameter. This allows a single fence device to be used in a two-node cluster, with a different delay for each node. You do not need to configure two separate devices to use separate delays. To specify different values for different nodes, you map the host names to the delay value for that node using a similar syntax to pcmk_host_map. For example, node1:0;node2:10s would use no delay when fencing node1 and a 10-second delay when fencing node2.

The pcmk_delay_max cluster property

Setting the pcmk_delay_max cluster property enables a random delay for fencing actions and specifies the maximum delay, which is the maximum value of the combined base delay and random delay. For example, if the base delay is 3 and pcmk_delay_max is 10, the random delay will be between 3 and 10.

When you set the pcmk_delay_base cluster property in addition to the pcmk_delay_max property, the overall delay is derived from a random delay value added to this static delay so that the sum is kept below the maximum delay. When you set pcmk_delay_max but do not set pcmk_delay_base there is no static component to the delay.

The priority-fencing-delay cluster property

Setting the priority-fencing-delay cluster property allows you to configure a two-node cluster so that in a split-brain situation the node with the fewest or least important resources running is the node that gets fenced.

The priority-fencing-delay property can be set to a time duration. The default value for this property is 0 (disabled). If this property is set to a non-zero value, and the priority meta-attribute is configured for at least one resource, then in a split-brain situation the node with the highest combined priority of all resources running on it will be more likely to remain operational. For example, if you set pcs resource defaults update priority=1 and pcs property set priority-fencing-delay=15s and no other priorities are set, then the node running the most resources will be more likely to remain operational because the other node will wait 15 seconds before initiating fencing. If a particular resource is more important than the rest, you can give it a higher priority.

The node running the promoted role of a promotable clone gets an extra 1 point if a priority has been configured for that clone.

Interaction of fencing delays

Setting more than one type of fencing delay yields the following results:

Any delay set with the priority-fencing-delay property is added to any delay from the pcmk_delay_base and pcmk_delay_max fence device properties. This behavior allows some delay when both nodes have equal priority, or both nodes need to be fenced for some reason other than node loss, as when on-fail=fencing is set for a resource monitor operation. When setting these delays in combination, set the priority-fencing-delay property to a value that is significantly greater than the maximum delay from pcmk_delay_base and pcmk_delay_max to be sure the prioritized node is preferred. Setting this property to twice this value is always safe.
Only fencing scheduled by Pacemaker itself observes fencing delays. Fencing scheduled by external code such as dlm_controld and fencing implemented by the pcs stonith fence command do not provide the necessary information to the fence device.
Some individual fence agents implement a delay parameter, with a name determined by the agent and which is independent of delays configured with a pcmk_delay_* property. If both of these delays are configured, they are added together and would generally not be used in conjunction.

4.9.5. Testing a fence device
Copy link

Validate fence devices to ensure the cluster can successfully recover from node failures and prevent data corruption. A complete testing strategy involves verifying network connectivity, executing the fence agent script directly, triggering the fence action through the cluster manager, and simulating a physical node failure.

Note

When a Pacemaker cluster node or Pacemaker remote node is fenced a hard kill should occur and not a graceful shutdown of the operating system. If a graceful shutdown occurs when your system fences a node, disable ACPI soft-off in the /etc/systemd/logind.conf file so that your system ignores any power-button-pressed signal. For instructions on disabling ACPI soft-off in the logind.conf file, see Disabling ACPI soft-off in the logind.conf file

Use the following procedure to test a fence device.

Procedure

Use SSH, Telnet, HTTP, or whatever remote protocol is used to connect to the device to manually log in and test the fence device or see what output is given. For example, if you will be configuring fencing for an IPMI-enabled device,then try to log in remotely with ipmitool. Take note of the options used when logging in manually because those options might be needed when using the fencing agent.
If you are unable to log in to the fence device, verify that the device is pingable, there is nothing such as a firewall configuration that is preventing access to the fence device, remote access is enabled on the fencing device, and the credentials are correct.
Run the fence agent manually, using the fence agent script. This does not require that the cluster services are running, so you can perform this step before the device is configured in the cluster. This can ensure that the fence device is responding properly before proceeding.
Note
These examples use the fence_ipmilan fence agent script for an iLO device. The actual fence agent you will use and the command that calls that agent will depend on your server hardware. You should consult the man page for the fence agent you are using to determine which options to specify. You will usually need to know the login and password for the fence device and other information related to the fence device.
The following example shows the format you would use to run the fence_ipmilan fence agent script with -o status parameter to check the status of the fence device interface on another node without actually fencing it. This allows you to test the device and get it working before attempting to reboot the node. When running this command, you specify the name and password of an iLO user that has power on and off permissions for the iLO device.
```
# fence_ipmilan -a ipaddress -l username -p password -o status
```
The following example shows the format you would use to run the fence_ipmilan fence agent script with the -o reboot parameter. Running this command on one node reboots the node managed by this iLO device.
```
# fence_ipmilan -a ipaddress -l username -p password -o reboot
```
If the fence agent failed to properly do a status, off, on, or reboot action, you should check the hardware, the configuration of the fence device, and the syntax of your commands. In addition, you can run the fence agent script with the debug output enabled. The debug output is useful for some fencing agents to see where in the sequence of events the fencing agent script is failing when logging into the fence device.
```
# fence_ipmilan -a ipaddress -l username -p password -o status -D /tmp/$(hostname)-fence_agent.debug
```
When diagnosing a failure that has occurred, you should ensure that the options you specified when manually logging in to the fence device are identical to what you passed on to the fence agent with the fence agent script.
For fence agents that support an encrypted connection, you may see an error due to certificate validation failing, requiring that you trust the host or that you use the fence agent’s ssl-insecure parameter. Similarly, if SSL/TLS is disabled on the target device, you may need to account for this when setting the SSL parameters for the fence agent.
Note
If the fence agent that is being tested is a fence_drac, fence_ilo, or some other fencing agent for a systems management device that continues to fail, then fall back to trying fence_ipmilan. Most systems management cards support IPMI remote login and the only supported fencing agent is fence_ipmilan.
Once the fence device has been configured in the cluster with the same options that worked manually and the cluster has been started, test fencing with the pcs stonith fence command from any node (or even multiple times from different nodes), as in the following example. The pcs stonith fence command reads the cluster configuration from the CIB and calls the fence agent as configured to start the fence action. This verifies that the cluster configuration is correct.
```
# pcs stonith fence node_name
```
If the pcs stonith fence command works properly, that means the fencing configuration for the cluster should work when a fence event occurs. If the command fails, it means that cluster management cannot invoke the fence device through the configuration it has retrieved. Check for the following issues and update your cluster configuration as needed.
- Check your fence configuration. For example, if you have used a host map you should ensure that the system can find the node using the host name you have provided.
- Check whether the password and user name for the device include any special characters that could be misinterpreted by the bash shell. Making sure that you enter passwords and user names surrounded by quotation marks could address this issue.
- Check whether you can connect to the device using the exact IP address or host name you specified in the pcs stonith command. For example, if you give the host name in the stonith command but test by using the IP address, that is not a valid test.
- If the protocol that your fence device uses is accessible to you, use that protocol to try to connect to the device. For example many agents use ssh or telnet. You should try to connect to the device with the credentials you provided when configuring the device, to see if you get a valid prompt and can log in to the device.
  If you determine that all your parameters are appropriate but you still have trouble connecting to your fence device, you can check the logging on the fence device itself, if the device provides that, which will show if the user has connected and what command the user issued. You can also search through the /var/log/messages file for instances of stonith and error, which could give some idea of what is transpiring, but some agents can provide additional information.
Once the fence device tests are working and the cluster is up and running, test an actual failure. To do this, take an action in the cluster that should initiate a token loss.
- Take down a network. How you take a network depends on your specific configuration. In many cases, you can physically pull the network or power cables out of the host. For information about simulating a network failure, see the Red Hat Knowledgebase solution What is the proper way to simulate a network failure on a RHEL Cluster?.
  Note
  Disabling the network interface on the local host rather than physically disconnecting the network or power cables is not recommended as a test of fencing because it does not accurately simulate a typical real-world failure.
- Block corosync traffic both inbound and outbound using the local firewall.
  The following example blocks corosync, assuming the default corosync port is used, firewalld is used as the local firewall, and the network interface used by corosync is in the default firewall zone:
  # firewall-cmd --direct --add-rule ipv4 filter OUTPUT 2 -p udp --dport=5405 -j DROP # firewall-cmd --add-rich-rule='rule family="ipv4" port port="5405" protocol="udp" drop'
- Simulate a crash and panic your machine with sysrq-trigger. Note, however, that triggering a kernel panic can cause data loss; it is recommended that you disable your cluster resources first.
  # echo c > /proc/sysrq-trigger

4.9.6. Configuring fencing levels
Copy link

Pacemaker supports fencing nodes with multiple devices through a feature called fencing topologies. To implement topologies, create the individual devices as you normally would and then define one or more fencing levels in the fencing topology section in the configuration.

Pacemaker processes fencing levels as follows:

Each level is attempted in ascending numeric order, starting at 1.
If a device fails, processing terminates for the current level. No further devices in that level are exercised and the next level is attempted instead.
If all devices are successfully fenced, then that level has succeeded and no other levels are tried.
The operation is finished when a level has passed (success), or all levels have been attempted (failed).

Use the following command to add a fencing level to a node. The devices are given as a comma-separated list of stonith ids, which are attempted for the node at that level.

pcs stonith level add level node devices

The following example sets up fence levels so that if the device my_ilo fails and is unable to fence the node, then Pacemaker attempts to use the device my_apc.

Prerequisites

You have configured an ilo fence device called my_ilo for node rh7-2.
You have configured an apc fence device called my_apc for node rh7-2.

Procedure

Add a fencing level of 1 for fence device my_ilo on node rh7-2.
```
# pcs stonith level add 1 rh7-2 my_ilo
```
Add a fencing level of 2 for fence device my_apc on node rh7-2.
```
# pcs stonith level add 2 rh7-2 my_apc
```

List the currently configured fencing levels.

# pcs stonith level
 Node: rh7-2
  Level 1 - my_ilo
  Level 2 - my_apc

4.9.7. Removing a fence level
Copy link

You can remove the fence level for the specified node and device. If no nodes or devices are specified then the fence level you specify is removed from all nodes.

Procedure

Remove the fence level for the specified node and device:

# pcs stonith level remove level [node_id] [stonith_id] ... [stonith_id]

4.9.8. Clearing fence levels
Copy link

You can clear the fence levels on the specified node or stonith id. If you do not specify a node or stonith id, all fence levels are cleared.

Procedure

Clear the fence levels on the specified node or stinith id:
```
# pcs stonith level clear [node]|stonith_id(s)]
```
If you specify more than one stonith id, they must be separated by a comma and no spaces, as in the following example.
```
# pcs stonith level clear dev_a,dev_b
```

4.9.9. Verifying nodes and devices in fence levels
Copy link

You can verify that all fence devices and nodes specified in fence levels exist.

Procedure

Use the following command to verify that all fence devices and nodes specified in fence levels exist:
```
# pcs stonith level verify
```

4.9.10. Specifying nodes in fencing topology
Copy link

You can specify nodes in fencing topology by a regular expression applied on a node name and by a node attribute and its value.

Procedure

The following commands configure nodes node1, node2, and node3 to use fence devices apc1 and apc2, and nodes node4, node5, and node6 to use fence devices apc3 and apc4:
```
# pcs stonith level add 1 "regexp%node[1-3]" apc1,apc2
# pcs stonith level add 1 "regexp%node[4-6]" apc3,apc4
```

The following commands yield the same results by using node attribute matching:

# pcs node attribute node1 rack=1
# pcs node attribute node2 rack=1
# pcs node attribute node3 rack=1
# pcs node attribute node4 rack=2
# pcs node attribute node5 rack=2
# pcs node attribute node6 rack=2
# pcs stonith level add 1 attrib%rack=1 apc1,apc2
# pcs stonith level add 1 attrib%rack=2 apc3,apc4

4.9.11. Configuring fencing for redundant power supplies
Copy link

When configuring fencing for redundant power supplies, the cluster must ensure that when attempting to reboot a host, both power supplies are turned off before either power supply is turned back on.

If the node never completely loses power, the node may not release its resources. This opens up the possibility of nodes accessing these resources simultaneously and corrupting them.

You need to define each device only once and to specify that both are required to fence the node.

Procedure

Create the first fence device.

# pcs stonith create apc1 fence_apc_snmp ipaddr=apc1.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"

Create the second fence device.

# pcs stonith create apc2 fence_apc_snmp ipaddr=apc2.example.com login=user passwd='7a4D#1j!pz864' pcmk_host_map="node1.example.com:1;node2.example.com:2"

Specify that both devices are required to fence the node.

# pcs stonith level add 1 node1.example.com apc1,apc2
# pcs stonith level add 1 node2.example.com apc1,apc2

4.9.12. Administering fence devices
Copy link

The pcs command-line interface provides a variety of commands you can use to administer your fence devices after you have configured them.

4.9.12.1. Displaying configured fence devices
Copy link

The following command shows all currently configured fence devices. If a stonith_id is specified, the command shows the options for that configured fencing device only. If the --full option is specified, all configured fencing options are displayed.

pcs stonith config [stonith_id] [--full]

4.9.12.2. Exporting fence devices as pcs commands
Copy link

You can display the pcs commands that can be used to re-create configured fence devices on a different system using the --output-format=cmd option of the pcs stonith config command.

The following commands create a fence_apc_snmp fence device and display the pcs command you can use to re-create the device.

# pcs stonith create myapc fence_apc_snmp ip="zapc.example.com" pcmk_host_map="z1.example.com:1;z2.example.com:2" username="apc" password="apc"
# pcs stonith config --output-format=cmd
Warning: Only 'text' output format is supported for stonith levels
pcs stonith create --no-default-ops --force -- myapc fence_apc_snmp \
  ip=zapc.example.com password=apc 'pcmk_host_map=z1.example.com:1;z2.example.com:2' username=apc \
  op \
    monitor interval=60s id=myapc-monitor-interval-60s

4.9.12.3. Exporting fence level configuration
Copy link

The pcs stonith config and the pcs stonith level config commands support the --output-format= option to export the fencing level configuration in JSON format and as pcs commands.

Specifying --output-format=cmd displays the pcs commands created from the current cluster configuration that configure fencing levels. You can use these commands to re-create configured fencing levels on a different system.
Specifying --output-format=json displays the fencing level configuration in JSON format, which is suitable for machine parsing.

4.9.12.4. Modifying and deleting fence devices
Copy link

Modify or add options to a currently configured fencing device with the following command.

pcs stonith update stonith_id [stonith_device_options]

Updating a SCSI fencing device with the pcs stonith update command causes a restart of all resources running on the same node where the fencing resource was running. You can use either version of the following command to update SCSI devices without causing a restart of other cluster resources. SCSI fencing devices can be configured as multipath devices.

pcs stonith update-scsi-devices stonith_id set device-path1 device-path2
pcs stonith update-scsi-devices stonith_id add device-path1 remove device-path2

Use the following command to remove a fencing device from the current configuration.

pcs stonith delete stonith_id

4.9.12.5. Manually fencing a cluster node
Copy link

You can fence a node manually with the following command. If you specify the --off option this will use the off API call to stonith which will turn the node off instead of rebooting it.

pcs stonith fence node [--off]

In a situation where no fence device is able to fence a node even if it is no longer active, the cluster may not be able to recover the resources on the node. If this occurs, after manually ensuring that the node is powered down you can enter the following command to confirm to the cluster that the node is powered down and free its resources for recovery.

Warning

If the node you specify is not actually off, but running the cluster software or services normally controlled by the cluster, data corruption and cluster failure occurs.

pcs stonith confirm node

4.9.12.6. Disabling a fence device
Copy link

To disable a fencing device, run the pcs stonith disable command.

The following command disables the fence device myapc.

# pcs stonith disable myapc

4.9.12.7. Preventing a node from using a fencing device
Copy link

To prevent a specific node from using a fencing device, you can configure location constraints for the fencing resource.

The following example prevents fence device node1-ipmi from running on node1.

# pcs constraint location node1-ipmi avoids node1

4.10. Configuring ACPI for use with integrated fence devices
Copy link

If your cluster uses integrated fence devices, you must configure ACPI (Advanced Configuration and Power Interface) to ensure immediate and complete fencing.

If a cluster node is configured to be fenced by an integrated fence device, disable ACPI Soft-Off for that node. Disabling ACPI Soft-Off allows an integrated fence device to turn off a node immediately and completely rather than attempting a clean shutdown (for example, shutdown -h now). Otherwise, if ACPI Soft-Off is enabled, an integrated fence device can take four or more seconds to turn off a node (see the note that follows). In addition, if ACPI Soft-Off is enabled and a node panics or freezes during shutdown, an integrated fence device may not be able to turn off the node. Under those circumstances, fencing is delayed or unsuccessful. Consequently, when a node is fenced with an integrated fence device and ACPI Soft-Off is enabled, a cluster recovers slowly or requires administrative intervention to recover.

Note

The amount of time required to fence a node depends on the integrated fence device used. Some integrated fence devices perform the equivalent of pressing and holding the power button; therefore, the fence device turns off the node in four to five seconds. Other integrated fence devices perform the equivalent of pressing the power button momentarily, relying on the operating system to turn off the node; therefore, the fence device turns off the node in a time span much longer than four to five seconds.

The preferred way to disable ACPI Soft-Off is to change the BIOS setting to "instant-off" or an equivalent setting that turns off the node without delay.

Disabling ACPI Soft-Off with the BIOS may not be possible with some systems. If disabling ACPI Soft-Off with the BIOS is not satisfactory for your cluster, you can disable ACPI Soft-Off with one of the following alternate methods:

Setting HandlePowerKey=ignore in the /etc/systemd/logind.conf file and verifying that the node node turns off immediately when fenced. This is the first alternate method of disabling ACPI Soft-Off.
Appending acpi=off to the kernel boot command line. This is the second alternate method of disabling ACPI Soft-Off, if the preferred or the first alternate method is not available.
Important
This method completely disables ACPI; some computers do not boot correctly if ACPI is completely disabled. Use this method only if the other methods are not effective for your cluster.

4.10.1. Disabling ACPI Soft-Off with the BIOS
Copy link

You can disable ACPI Soft-Off by configuring the BIOS of each cluster node.

Note

The procedure for disabling ACPI Soft-Off with the BIOS may differ among server systems. You should verify this procedure with your hardware documentation.

Procedure

Reboot the node and start the BIOS CMOS Setup Utility program.
Navigate to the Power menu (or equivalent power management menu).
At the Power menu, set the Soft-Off by PWR-BTTN function (or equivalent) to Instant-Off (or the equivalent setting that turns off the node by means of the power button without delay). The BIOS CMOS Setup Utiliy example below shows a Power menu with ACPI Function set to Enabled and Soft-Off by PWR-BTTN set to Instant-Off.
Note
The equivalents to ACPI Function, Soft-Off by PWR-BTTN, and Instant-Off may vary among computers. However, the objective of this procedure is to configure the BIOS so that the computer is turned off by means of the power button without delay.
Exit the BIOS CMOS Setup Utility program, saving the BIOS configuration.
Verify that the node turns off immediately when fenced. For information about testing a fence device, see Testing a fence device.

Example 4.1. BIOS CMOS Setup Utility

`Soft-Off by PWR-BTTN` set to
`Instant-Off`

+---------------------------------------------|-------------------+
|    ACPI Function             [Enabled]      |    Item Help      |
|    ACPI Suspend Type         [S1(POS)]      |-------------------|
|  x Run VGABIOS if S3 Resume   Auto          |   Menu Level   *  |
|    Suspend Mode              [Disabled]     |                   |
|    HDD Power Down            [Disabled]     |                   |
|    Soft-Off by PWR-BTTN      [Instant-Off   |                   |
|    CPU THRM-Throttling       [50.0%]        |                   |
|    Wake-Up by PCI card       [Enabled]      |                   |
|    Power On by Ring          [Enabled]      |                   |
|    Wake Up On LAN            [Enabled]      |                   |
|  x USB KB Wake-Up From S3     Disabled      |                   |
|    Resume by Alarm           [Disabled]     |                   |
|  x  Date(of Month) Alarm       0            |                   |
|  x  Time(hh:mm:ss) Alarm       0 :  0 :     |                   |
|    POWER ON Function         [BUTTON ONLY   |                   |
|  x KB Power ON Password       Enter         |                   |
|  x Hot Key Power ON           Ctrl-F1       |                   |
|                                             |                   |
|                                             |                   |
+---------------------------------------------|-------------------+

This example shows ACPI Function set to Enabled, and Soft-Off by PWR-BTTN set to Instant-Off.

4.10.2. Disabling ACPI Soft-Off in the logind.conf file
Copy link

You can disable power-key handing in the /etc/systemd/logind.conf file.

Procedure

Define the following configuration in the /etc/systemd/logind.conf file:
```
HandlePowerKey=ignore
```

Restart the systemd-logind service:

# systemctl restart systemd-logind.service

Verification

Verify that the node turns off immediately when fenced. For information about testing a fence device, see Testing a fence device.

4.10.3. Disabling ACPI completely in the GRUB 2 file
Copy link

You can disable ACPI Soft-Off by appending acpi=off to the GRUB menu entry for a kernel.

Important

This method completely disables ACPI; some computers do not boot correctly if ACPI is completely disabled. Use this method only if the other methods are not effective for your cluster.

Procedure

Use the --args option in combination with the --update-kernel option of the grubby tool to change the grub.cfg file of each cluster node as follows:
```
# grubby --args=acpi=off --update-kernel=ALL
```
Reboot the node.

Verification

Verify that the node turns off immediately when fenced. For information about testing a fence device, see Testing a fence device

4.11. Configuring the virtual IP management resource agent
Copy link

The gcp-vpc-move-vip resource agent attaches a secondary IP address (alias IP) to a running instance. You can assign this floating IP address between different nodes in the cluster.

# pcs resource describe gcp-vpc-move-vip

You can configure the resource agent to use a primary subnet address range or a secondary subnet address range.

4.11.1. Configuring the primary subnet address range
Copy link

If you need to automate or manage the assigned IP addresses allocation for VM or other resources within a subnet, you can use the primary subnet address range. It ensures that the primary address range is correctly set and configure to use as stable IP addresses for the primary virtual private network (VPC) subnet.

Procedure

Create the aliasip resource by including an unused internal IP address and the CIDR block:

# pcs resource create aliasip gcp-vpc-move-vip alias_ip=UnusedIPaddress/CIDRblock

Example:

[root@rhel81-node-01 ~]# pcs resource create aliasip gcp-vpc-move-vip alias_ip=10.10.10.200/32

Create an IPaddr2 resource for managing the IP on the node:

# pcs resource create vip IPaddr2 nic=interface ip=AliasIPaddress cidr_netmask=32

Example:

[root@rhel81-node-01 ~]# pcs resource create vip IPaddr2 nic=eth0 ip=10.10.10.200 cidr_netmask=32

Group the network resources under vipgrp:

# pcs resource group add vipgrp aliasip vip

Verification

Verify the active resources and under the vipgrp group:
```
# pcs status
```

Verify the movable resources across the nodes:

# pcs resource move vip Node

Example:

[root@rhel81-node-01 ~]# pcs resource move vip rhel81-node-03

Verify if the vip successfully started on a different node:
```
# pcs status
```

4.11.2. Configuring the secondary subnet address range
Copy link

You can use the secondary subnet address range if you need to assign IP addresses from additional and predefined range within the same subnet, without creating a new subnet. It is useful for specific purposes such as custom routing. With secondary subnet address range, you can manage network traffic in a single subnet with multiple IP address ranges.

Prerequisites

You have created a custom network and a subnet
Optional: You have installed Google Cloud SDK. For instructions, see Installing the Google Cloud SDK.
Note
You can also use the gcloud commands in the following procedure in the terminal that you can activate in the Google Cloud web console.

Procedure

Create a secondary subnet address range:

# gcloud compute networks subnets update SubnetName --region RegionName --add-secondary-ranges SecondarySubnetName=SecondarySubnetRange

Example:

# gcloud compute networks subnets update range0 --region us-west1 --add-secondary-ranges range1=10.10.20.0/24

Create the aliasip resource with an unused internal IP address in the secondary subnet address range and the CIDR block:

# pcs resource create aliasip gcp-vpc-move-vip alias_ip=UnusedIPaddress/CIDRblock

Example:

[root@rhel81-node-01 ~]# pcs resource create aliasip gcp-vpc-move-vip alias_ip=10.10.20.200/32

Create an IPaddr2 resource for managing the IP on the node:

# pcs resource create vip IPaddr2 nic=interface ip=AliasIPaddress cidr_netmask=32

Example:

[root@rhel81-node-01 ~]# pcs resource create vip IPaddr2 nic=eth0 ip=10.10.20.200 cidr_netmask=32

Group the network resources under vipgrp:

# pcs resource group add vipgrp aliasip vip

Verification

Verify that the active resources are under the vipgrp group:
```
# pcs status
```

Verify that the movable resources across different nodes:

# pcs resource move vip Node

Example:

[root@rhel81-node-01 ~]# pcs resource move vip rhel81-node-03

Verify that the vip successfully started on a different node:
```
# pcs status
```

Chapter 5. Configuring RHEL on Google Cloud with Secure Boot
Copy link

To enhance boot security for a Red Hat Enterprise Linux (RHEL) instance on Google Cloud, configure Secure Boot. Secure Boot verifies the digital signatures of the boot loader and other components at startup, allowing only trusted programs to load while blocking unauthorized ones.

5.1. Understanding Secure Boot for RHEL on cloud
Copy link

When Secure Boot detects any tampered components or components signed by untrusted entities, it aborts the boot process. Secure Boot plays a critical role in configuring a Confidential Virtual Machine (CVM) by ensuring that only trusted entities participate in the boot chain.

Secure Boot is a Unified Extensible Firmware Interface (UEFI) feature that verifies digital signatures of boot components, such as boot loader and kernel, against trusted keys stored in hardware. Secure Boot prevents unauthorized or tampered software from running during boot, protecting your system from malicious code. It authenticates access to specific device paths through defined interfaces, enforces the use of the latest configuration, and permanently overwrites earlier configurations. When the Red Hat Enterprise Linux (RHEL) kernel boots with Secure Boot enabled, it enters the lockdown mode, allowing only kernel modules signed by a trusted vendor to load. Therefore, Secure Boot strengthens the security of the operating system boot sequence.

Components of Secure Boot

The Secure Boot mechanism consists of firmware, signature databases, cryptographic keys, boot loader, hardware modules, and the operating system. The following are the components of the UEFI trusted variables:

Key Exchange Key database (KEK): An exchange of public keys to establish trust between the RHEL operating system and the VM firmware. You can also update Allowed Signature database (db) and Forbidden Signature database (dbx) by using these keys.
Platform Key database (PK): A self-signed single-key database to establish trust between the VM firmware and the cloud platform. The PK also updates the KEK database.
Allowed Signature database (db): A database that maintains a list of certificates or binary hashes to check whether the binary file can boot on the system. Additionally, all certificates from db are imported to the .platform keyring of the RHEL kernel. With this feature, you can add and load signed third party kernel modules in the lockdown mode.
Forbidden Signature database (dbx): A database that maintains a list of certificates or binary hashes that are not allowed to boot on the system.

Note

Binary files check against the dbx database and the Secure Boot Advanced Targeting (SBAT) mechanism. With SBAT, you can revoke older versions of specific binaries by keeping the certificate that has signed binaries as valid.

Stages of Secure Boot for RHEL on Cloud

When a RHEL instance boots in the Unified Kernel Image (UKI) mode and with Secure Boot enabled, the RHEL instance interacts with the cloud service infrastructure in the following sequence:

Initialization: When a RHEL instance boots, the cloud-hosted firmware initially boots and implements the Secure Boot mechanism.
Variable store initialization: The firmware initializes UEFI variables from a variable store, a dedicated storage area for information that firmware needs to manage for the boot process and runtime operations. When the RHEL instance boots for the first time, the store initializes from default values associated with the VM image.
Boot loader: When booted, the firmware loads the first stage boot loader. For the RHEL instance in a x86 UEFI environment, the first stage boot loader is shim. The shim boot loader authenticates and loads the next stage of the boot process and acts as a bridge between UEFI and GRUB.
1. The shim x86 binary in RHEL is currently signed by the Microsoft Corporation UEFI CA 2011 Microsoft certificate so that the RHEL instance can boot in the Secure Boot enabled mode on various hardware and virtualized platforms where the Allowed Signature database (db) has the default Microsoft certificates.
2. The shim binary extends the list of trusted certificates with Red Hat Secure Boot CA and optionally, with Machine Owner Key (MOK).
UKI: The shim binary loads the RHEL UKI (the kernel-uki-virt package). The corresponding certificate, Red Hat Secure Boot Signing 504 on the x86_64 architecture, signs the UKI. You can find this certificate in the redhat-sb-certs package. Red Hat Secure Boot CA signs this certificate, so the check succeeds.
UKI add-ons: When you use the UKI cmdline extensions, the RHEL kernel actively checks their signatures against db, MOK, and certificates shipped with shim. This process ensures that either the operating system vendor RHEL or a user has signed the extensions.

When the RHEL kernel boots in the Secure Boot mode, it enters the lockdown mode. After entering lockdown, the RHEL kernel adds the db keys to the .platform keyring and the MOK keys to the .machine keyring. During the kernel build process, the build system works with an ephemeral key, which consists of private and public keys. The build system signs standard RHEL kernel modules, such as kernel-modules-core, kernel-modules, and kernel-modules-extra. After the completion of each kernel build, the private key becomes obsolete to sign third-party modules. You can use certificates from db and MOK for this purpose.

5.2. Configuring a RHEL instance from a custom RHEL image with Secure Boot
Copy link

To ensure that your Red Hat Enterprise Linux (RHEL) instance on Google Cloud has a secured operating system booting process, use the Secure Boot mechanism. You can configure the Secure Boot during registration of a custom RHEL Google Cloud image.

Prerequisites

You have created and uploaded a RHEL Google Cloud image.
You have installed the following packages:
- python3
- keyutils
- openssl
- python3-virt-firmware
You have installed the google-cloud-cli utility.
You have included a custom certificate in the SecureBoot DB to sign custom artifacts, such as third party kernel modules and Unified Kernel Image (UKI) add-ons.

Procedure

Generate a custom certificate custom_db.cer:

$ openssl req -quiet \
-newkey rsa:4096 \
-nodes -keyout custom_db.key \
-new -x509 -sha256 \
-days 3650 \
-subj "/CN=Signature Database key/" \
-outform DER \
-out custom_db.cer

Convert the certificates to the Distinguished Encoding Rules (DER) format.

To convert a Platform Key database (PK) certificate, enter:

$ openssl x509 -in /usr/share/virt-sb-certs/redhat.com/rh-pk-kek-2014.pem \
-inform PEM \
-out rh-pk-kek-2014.cer \
-outform DER

To convert a Key Exchange Key database (KEK) certificate, enter:

$ openssl x509 -in /usr/share/virt-sb-certs/microsoft.com/ms-kek-2023.pem \
-inform PEM \
-out ms-kek-2023.cer \
-outform DER

To convert a allowed signature database (db) certificate, enter:

$ openssl x509 -in /usr/share/virt-sb-certs/microsoft.com/ms-uefi-2023.pem \
-inform PEM \
-out ms-uefi-2023.cer \
-outform DER

Use the google-cloud-cli utility to create and register the image from a disk snapshot with the required Secure Boot variables:

$ gcloud compute images create <example-rhel-10-efi-image> \
--source-image projects/<example_project_id>/global/images/<example_image_name> \
--platform-key-file=rh-pk-kek-2014.cer \
--key-exchange-key-file=rh-pk-kek-2014.cer,ms-kek-2023.cer \
--signature-database-file=custom_db.cer,ms-uefi-2023.cer \
--forbidden-database-file /usr/share/edk2/ovmf/DBXUpdate*.bin \
--guest-os-features="UEFI_COMPATIBLE"

Launch the instance of an example-rhel-10-efi-image image with the Turn on Security Boot feature from the Google Cloud console.

Verification

Verify if Secure Boot is enabled:
```
$ mokutil --sb-state
```
```
SecureBoot enabled
```

Verify the kernel keyring for the custom certificate:

$ sudo keyctl list %:.platform

...
757453569: ---lswrv     0     0 asymmetric: Signature Database key: f064979641c24e1b935e402bdbc3d5c4672a1acc
...

5.3. Configuring a RHEL instance on the Google Cloud Marketplace with Secure Boot
Copy link

Enable Secure Boot for a Red Hat Enterprise Linux (RHEL) instance on Google Cloud to secure the operating system (OS) boot process. By default, Secure Boot uses the Allowed Signature database (db) containing Microsoft certificates.

Prerequisites

You have an account on Google Cloud. For details, see Google Cloud.

Procedure

Launch a publicly available RHEL instance from Google Cloud console.
Install the keyutils package:
```
$ sudo dnf install keyutils
```
Enable the Turn on Secure Boot option on the RHEL Google Cloud instance:

Verification

Verify if Secure Boot is enabled:
```
$ mokutil --sb-state
```
```
SecureBoot enabled
```

Verify the kernel keyring for the custom certificate:

$ sudo keyctl list %:.platform

4 keys in keyring:
12702216: ---lswrv     0     0 asymmetric: Microsoft Corporation UEFI CA 2011: 13adbf4309bd82709c8cd54f316ed522988a1bd4
449796228: ---lswrv     0     0 asymmetric: Red Hat Secure Boot CA 8: e1c6c580aa1e21d585aad9bf20f3929e5ec1f08b
50338534: ---lswrv     0     0 asymmetric: Red Hat Secure Boot CA 5: cc6fa5e72868ba494e939bbd680b9144769a9f8f
681047026: ---lswrv     0     0 asymmetric: Microsoft Windows Production PCA 2011: a92902398e16c49778cd90f99e4f9ae17c55af53

Chapter 6. Configuring RHEL on Public Cloud Platforms with Intel TDX
Copy link

To have a secure and isolated environment for your VM, you can use Intel Trust Domain Extensions (TDX) on Red Hat Enterprise Linux (RHEL) instances on public cloud platforms. Intel TDX is a security type of Confidential Virtual Machine (CVM).

6.1. Introduction to Intel TDX
Copy link

You can enable secure boot process for your RHEL instance, by using Intel Trust Domain Extensions (TDX).

Intel TDX is an advancement to the former technology, Intel Software Guard Extensions (SGX). SGX provides VM isolation from the hypervisor and cloud service providers by creating secure memory regions known as enclaves. Application code stored in enclaves has access to memory and data stored inside the enclave, making it inaccessible to outside entities. TDX is a hardware-based trusted execution environment (TEE)..

TDX creates hardware-isolated VMs called Trusted Domains (TDs). You can use TDs to isolate virtual machines (VM) from Virtual Machine Manager (VMM), hypervisors, other VMs, and the host, and allow only the VM to access its resources. This ensures that while using resources from hypervisor, CPU, TD VMs remain secure by maintaining data confidentiality and integrity.

The main difference between SGX and TDX is that SGX works at application level while TDX works at virtualization level by limiting hypervisor access.

Note

Before deploying Red Hat Enterprise Linux (RHEL) on a public cloud platform, always check with the corresponding cloud service provider for the support status and certification of the particular RHEL instance type.

6.2. Understanding the Intel TDX secure boot process
Copy link

To ensure that the VM is running in a trusted and secured environment, you need to use Intel Trust Domain Extensions (TDX) on your Red Hat Enterprise Linux (RHEL) instance to have the secure boot process.

Procedure

Initialization and measurement: A TDX-enabled hypervisor sets the initial state of a VM. This hypervisor loads the firmware binary file into the VM memory and sets the initial register state. The Intel processor measures the initial state of the VM and provides details to verify the initial state of the VM.
Firmware: The VM initiates the UEFI firmware. The firmware might include stateful or stateless Virtual Trusted Platform Module (vTPM) implementation. Stateful vTPM maintains persistent cryptographic state across VM reboots and migrations, whereas stateless vTPM generates fresh cryptographic state for each VM session without persistence. Virtual Machine Privilege Levels (VMPL) technology isolates vTPM from the guest. VMPL offers hardware-enforced privilege isolation between different VM components and the hypervisor.
vTPM: Depending on your cloud service provider, for stateful vTPM implementation, the UEFI firmware might perform a remote attestation to decrypt the persistent state of vTPM. The vTPM also gathers data about the boot process, such as Secure Boot state, certificates used for signing boot artifacts, or UEFI binary hashes.
Shim : When the UEFI firmware finishes the initialization process, it searches for the extended firmware interface (EFI) system partition. Then, the UEFI firmware verifies and executes the first stage boot loader from there. For RHEL, this is shim. The shim program allows non-Microsoft operating systems to load the second stage boot loader from the EFI system partition.
1. shim uses a Red Hat certificate to verify the second stage boot loader (grub) or Red Hat Unified Kernel Image (UKI).
2. grub or UKI unpacks, verifies, and executes Linux kernel and initramfs, and the kernel command line. This process ensures that the Linux kernel is loaded in a trusted and secured environment.
Initramfs: In initramfs, vTPM information automatically unlocks the encrypted root partition in case of full disk encryption technology.
1. When the root volume becomes available, initramfs transfers the execution flow there.
Attestation: The VM tenant gets access to the system and can perform a remote attestation to ensure that the accessed VM is an untampered Confidential Virtual Machine (CVM). Attestation is performed based on information from the Intel processor and vTPM. This process confirms the authenticity and reliability of the initial CPU and memory state of the RHEL instance and Intel processor.
TEE: This process creates a Trusted Execution Environment (TEE) to ensure that booting of the VM is in a trusted and secured environment.

Chapter 7. Configuring a RHEL instance on Google Cloud with Intel TDX
Copy link

Intel TDX is a hardware based trusted execution environment (TEE). This confidential computing technology provides isolation and integrity to virtual machines.

Prerequisites

You have installed the openssh and openssh-clients packages.
You have installed the google-cloud-cli utility. For instructions, see Installing gcloud CLI on RHEL.
You have launched the instance only from a supported Google Cloud instance type. For details, see Supported Google Cloud instance types for TDX.

Procedure

Create a new project:

$ gcloud projects create <example_tdx_project> --name=”RHEL TDX Project”

Configure the project by setting the google-cloud-cli utility property:
```
$ gcloud config set project <example_tdx_project>
```

Create a RHEL compute instance:

$ gcloud compute instances create <example-rhel-9-tdx-instance> \
--confidential-compute-type=TDX \
--machine-type=c3-standard-4 \
--min-cpu-platform="Intel Sapphire Rapids" \
--maintenance-policy="TERMINATE" \
--image=<rhel-guest-image-9-6-20250410-0-x86-64> \
--image-project="rhel-cloud" \
--subnet=<example_subnet>

Open ports on the RHEL instance:

$ gcloud compute firewall-rules create allow-ssh \
--allow tcp:22 \
--source-ranges 0.0.0.0/0 \
--target-tags ssh

Connect to the RHEL Google Cloud instance.
1. Optional: Create a new key pair:
  $ gcloud compute ssh <cloud-user>@<example-rhel-9-tdx-instance>
2. Connect to the RHEL instance by using your key pair:
  $ ssh -i <private key> <cloud-user>@<instance ip>

Verification

Check the VM configuration of the RHEL instance:

$ gcloud compute instances describe <example-rhel-9-tdx-instance> --format="get(metadata)"

Check kernel logs to verify the status of the TDX feature:

$ sudo dmesg | grep -i tdx

[    0.000000] tdx: Guest detected
[    1.334504] process: using TDX aware idle routine
[    1.413419] Memory Encryption Features active: Intel TDX
[    3.606910] systemd[1]: Detected confidential virtualization tdx.
[   10.680475] systemd[1]: Detected confidential virtualization tdx.

Check the CPU flags of the VM:

$ grep -E 'tdx_guest' /proc/cpuinfo

flags : ...`tdx_guest`...

Chapter 8. Configuring RHEL on Public Cloud Platforms with AMD SEV SNP
Copy link

To prevent VM integrity-based attacks and reduce memory integrity violations, you need to configure AMD Secure Encrypted Virtualization with Secure Nested Paging (SEV-SNP). AMD processors offer three hardware-based security mechanisms: SEV, SEV-ES, and SEV-SNP.

SEV: The SEV mechanism encrypts virtual machine (VM) memory to prevent the hypervisor from accessing VM data.
SEV-ES: SEV with Encrypted State (SEV-ES) extends SEV by encrypting CPU register states. This mechanism prevents the hypervisor from accessing or modifying VM CPU registers. Despite providing isolation between hypervisor and VM, it is still vulnerable to memory integrity attacks.
SEV-SNP: SEV-SNP is an enhancement to SEV-ES that adds memory integrity protection along with VM encryption. This mechanism prevents the hypervisor from modifying page tables to redirect VM memory access, protecting against replay attacks and memory tampering.
Note
Before deploying Red Hat Enterprise Linux (RHEL) on a public cloud platform, always check with the corresponding cloud service provider for the support status and certification of the particular RHEL instance type.

8.1. Introduction to AMD SEV SNP
Copy link

To protect your data from access by the hypervisor and cloud service provider, you can configure AMD Secure Encrypted Virtualization with Secure Nested Paging (SEV-SNP) on Red Hat Enterprise Linux (RHEL) instances on public cloud platforms.

Properties of SEV-SNP

Secure Processor: The AMD EPYC processor integrates a Secure Processor (SP) subsystem. AMD SP is a dedicated hardware component to manage keys and encryption operations.
Memory Integrity: For managing virtualization and isolation, memory management unit (MMU) utilizes page tables to translate virtual addresses to guest-physical addresses. SEV-SNP uses nested page tables for translating guest-physical addresses to host-physical addresses. Once nested page tables are defined, the hypervisor or host cannot alter page tables to modify the VM into accessing different pages, resulting in protection of memory integrity. SEV-SNP uses this method to offer protection against replay attacks and malicious modifications to VM memory.
Memory Encryption: The AMD EPYC processor hides the memory encryption key, which remains hidden from both host and VM.
Attestation report for verification: A CPU-generated report about RHEL instance information in an authorized cryptographic format. This process confirms the authenticity and reliability of the initial CPU and memory state of the RHEL instance and AMD processor.
Note
Even if a hypervisor creates the primary memory and CPU register state of the VM, they remain hidden and inaccessible to the hypervisor after initialization of that VM.

Understanding AMD SEV SNP secure boot process

Initialization and measurement: A SEV-SNP enabled hypervisor sets the initial state of a VM. This hypervisor loads firmware binary into the VM memory and sets the initial register state. AMD Secure Processor (SP) measures the initial state of the VM and provides details to verify the initial state of the VM.
Firmware: The VM initiates the UEFI firmware. The firmware might include either stateful or stateless Virtual Trusted Platform Module (vTPM) implementation. Stateful vTPM maintains persistent cryptographic state across VM reboots and migrations, whereas stateless vTPM generates fresh cryptographic state for each VM session without persistence. Virtual Machine Privilege Levels (VMPL) technology isolates vTPM from the guest. VMPL offers hardware-enforced privilege isolation between different VM components and the hypervisor.
vTPM: Depending on your cloud service provider, for stateful vTPM implementation, the UEFI firmware might perform a remote attestation to decrypt the persistent state of vTPM.
1. The vTPM also measures facts about the boot process such as Secure Boot state, certificates used for signing boot artifacts, UEFI binary hashes, and so on.
Shim: When the UEFI firmware finishes the initialization process, it searches for the extended firmware interface (EFI) system partition. Then, the UEFI firmware verifies and executes the first stage boot loader from there. For RHEL, this is shim. The shim program allows non-Microsoft operating systems to load the second stage boot loader from the EFI system partition.
1. shim uses a Red Hat certificate to verify the second stage boot loader (grub) or Red Hat Unified Kernel Image (UKI).
2. grub or UKI unpacks, verifies, and executes Linux kernel and initial RAM filesystem (initramfs), and the kernel command line. This process ensures that the Linux kernel is loaded in a trusted and secured environment.
Initramfs: In initramfs, vTPM information automatically unlocks the encrypted root partition in case of full disk encryption technology.
1. When the root volume becomes available, initramfs transfers the execution flow to the root volume.
Attestation: The VM tenant gets access to the system and can perform a remote attestation to ensure that the accessed VM is an untampered Confidential Virtual Machine (CVM). Attestation is performed based on information from AMD SP and vTPM. This process confirms the authenticity and reliability of the initial CPU and memory state of the RHEL instance and AMD processor.
TEE: This process creates a Trusted Execution Environment (TEE) to ensure that booting of the VM is in a trusted and secured environment.

Chapter 9. Configuring a RHEL instance on Google Cloud with AMD SEV SNP
Copy link

To create a trusted boot environment, you can configure AMD Secure Encrypted Virtualization with Secure Nested Paging (SEV-SNP) on Red Hat Enterprise Linux (RHEL) instances on Google Cloud. SEV-SNP aims at protecting your data from access by the hypervisor and cloud service provider.

Prerequisites

You have installed the openssh and openssh-clients packages.
You have installed the google-cloud-cli utility. For details, see installing gcloud CLI on RHEL.
You have created a Google Cloud instance using an AMD EPYC processor-based machine type from the supported list. For details, see the specified Google Cloud instance types.

Procedure

Create a new Google Cloud project:

$ gcloud projects create <example_sev_snp_project> --name="RHEL SEV SNP Project"

Configure the Google Cloud project:

$ gcloud config set project <example_sev_snp_project>

Create a RHEL compute instance:

$ gcloud compute instances create <example-rhel-10-sev-snp-instance> \
--confidential-compute-type=SEVSNP \
--machine-type=n2d-standard-2 \
--min-cpu-platform="AMD Milan" \
--maintenance-policy="TERMINATE" \
--image=<rhel-guest-image-10-0-20251016-6-x86-64> \
--image-project="rhel-cloud" \
--subnet=<example_subnet>

Connect to the RHEL instance by using public and private RSA key pair:
1. Connect to the RHEL instance by using a new public and private RSA key pair:
  $ gcloud compute ssh <cloud_user>@<example-rhel-10-sev-snp-instance>
2. Connect to the RHEL instance by using an existing key pair:
  $ ssh -i <example_private_key> <cloud_user>@<instance_ip>

Verification

Check metadata of VM configuration on RHEL instance:

$ gcloud compute instances describe <example-rhel-10-sev-snp-instance> --format="get(metadata)"

Check kernel logs to verify status of SEV-SNP:

$ sudo dmesg | grep -i sev

[ 0.302688] Memory Encryption Features active: AMD SEV SEV-ES SEV-SNP
[ 0.303645] SEV: Status: SEV SEV-ES SEV-SNP
[ 0.626662] SEV: APIC: wakeup_secondary_cpu() replaced with wakeup_cpu_via_vmgexit()
[ 0.686825] SEV: Using SNP CPUID table, 57 entries present.
[ 0.687645] SEV: SNP running at VMPL0.
[ 1.477045] SEV: SNP guest platform devices initialized.
[ 19.921893] systemd[1]: Detected confidential virtualization sev-snp.
[ 25.604801] systemd[1]: Detected confidential virtualization sev-snp.
[ 29.089032] sev-guest sev-guest: Initialized SEV guest driver (using VMPCK0 communication key)

Chapter 10. Configuring RHEL on Public Cloud Platforms with UKI
Copy link

To ensure that a Red Hat Enterprise Linux (RHEL) instance has a secured boot process from an untrusted storage such as confidential virtual machine (CVM) on a public cloud platform, use Unified Kernel Image (UKI).

10.1. Introduction to Unified Kernel Image
Copy link

To extend the secure boot protection throughout the entire boot chain, use Unified Kernel Image (UKI).

Components of UKI

Unified Kernel Image (UKI) is a Unified Extensible Firmware Interface (UEFI) Portable Executable (PE) binary for the UEFI environment that bundles together the essential components of an operating system. UKI binary components extend the Secure Boot coverage by including the initramfs and the kernel command line. Initramfs is a part of the Linux startup process, while the kernel command line gives you limited access to define parameters. The primary components contained within the UKI binary are:

The .linux section stores the Linux kernel image.
The .initrd section stores the initial RAM filesystem initramfs.
The .cmdline section stores the kernel command line.
Additional sections, such as .sbat.
The Red Hat signature.

Features of RHEL UKI with pre-built initramfs

Prohibits any malicious agent or component to alter any objects in the boot chain.
Due to pre-built initramfs, the user does not need to build its custom initramfs, which results in a faster kernel installation.
Provides support for the pre-built initramfs systems as it is similar in all installations such as virtual machine (VMs), containers, or cloud instances.
Provides support for the x86_64 architecture.
Includes the kernel-uki-virt package.
Built for virtual machines and cloud instances.

Limitation of UKI because of the reduced flexibility of the boot process

When building the UKI, the operating system vendor creates initramfs. As a consequence, the listed and included kernel modules are static. You can use the systemd system and configuration extensions to address this limitation.
The kernel command line parameters are static, which limits the use of parameters for different instance sizes or debugging options.

You can use the UKI command line extensions to overcome this limitation.

10.2. Understanding the UKI secure boot process
Copy link

To protect your system against unauthorized boot-time modifications, use the secure boot mechanism with Unified Kernel Image (UKI).

When using UKI with secure boot, the system verifies each component in the boot chain to ensure system integrity and prevent malicious code execution.

Procedure

UEFI Firmware: The boot process starts from the Unified Extensible Firmware Interface (UEFI) firmware. For boot, Red Hat Enterprise Linux (RHEL) UKI requires UEFI firmware, because legacy basic input/output system (BIOS) firmware is not supported.
Shim boot loader: Use the shim boot loader for booting rather than directly booting the UKI (a PE binary) from the UEFI firmware. shim includes additional security mechanisms such as Machine Owner Key (MOK) and Secure Boot Advanced Targeting (SBAT).
Signature verification (Secure Boot UEFI mechanism): During boot, shim reads the UKI binary and the secure boot UEFI mechanism verifies the signature of UKI against trusted keys stored in the Secure Boot Allowed Signature Database (db) of the system, the MOK database, and the built-in database of the shim binary. If the signature key is valid, the verification passes.
SBAT verification: Immediately after signature verification, the shim boot loader verifies the SBAT rules at startup.
During SBAT verification, the system compares generation numbers, for components such as systemd.rhel or linux.rhel, embedded in the UKI by using the .sbat section against values in the shim boot loader. If the generation number for a component in the shim is higher than the generation number in the UKI, the binary gets automatically discarded, even if it was signed by a trusted key.
Note that the generation number is a version identifier for UEFI applications, such as shim and grub.
Unpacking and Execution: If verification passes, control passes from shim to the systemd-stub code in the UKI to continue the boot process.
systemd-stub add-ons: During execution, systemd-stub unpacks and extracts the contents of the .cmdline section, the plain text kernel command line, and the .initrd section, the temporary root file system, for the boot process.
Note that systemd-stub reads UKI add-ons, which are also PE binaries, and verifies their signatures to safely extend the kernel command line of UKI by appending the .cmdline content from add-ons. systemd-stub reads add-ons from two locations:
- Global (UKI-independent) add-ons from the /loader/addons/ directory on the Extensible Firmware Interface (EFI) System Partition (ESP).
- Per-UKI add-ons from the /EFI/Linux/<UKI-name>.extra.d/ directory on the ESP.
Control passes from systemd-stub to the Linux kernel and the operating system boot process continues.
From this point, secure boot with UKI mechanisms follows the standard kernel boot process.

Legal Notice
Copy link

Except as otherwise noted below, the text of and illustrations in this documentation are licensed by Red Hat under the Creative Commons Attribution–Share Alike 3.0 Unported license . If you distribute this document or an adaptation of it, you must provide the URL for the original version.

Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, the Red Hat logo, JBoss, Hibernate, and RHCE are trademarks or registered trademarks of Red Hat, LLC. or its subsidiaries in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

XFS is a trademark or registered trademark of Hewlett Packard Enterprise Development LP or its subsidiaries in the United States and other countries.

The OpenStack® Word Mark and OpenStack logo are trademarks or registered trademarks of the Linux Foundation, used under license.

All other trademarks are the property of their respective owners.

Deploying and managing RHEL on Google Cloud

Obtaining RHEL system images and creating RHEL instances on Google Cloud

Providing feedback on Red Hat documentationCopy linkLink copied to clipboard!

Chapter 1. Introducing RHEL on public cloud platformsCopy linkLink copied to clipboard!

1.1. Benefits of using RHEL in a public cloudCopy linkLink copied to clipboard!

1.2. Public cloud use cases for RHELCopy linkLink copied to clipboard!

1.3. Frequent concerns when migrating to a public cloudCopy linkLink copied to clipboard!

1.4. Obtaining RHEL for public cloud deploymentsCopy linkLink copied to clipboard!

1.5. Methods for creating RHEL cloud instancesCopy linkLink copied to clipboard!

Chapter 2. Preparing and uploading custom GCE images to Google CloudCopy linkLink copied to clipboard!

2.1. Configuring and uploading a gce image to Google Cloud by using the CLICopy linkLink copied to clipboard!

2.2. How RHEL image builder sorts the authentication order of different Google Cloud credentialsCopy linkLink copied to clipboard!

Chapter 3. Deploying a RHEL image as a Google Compute Engine instance on Google CloudCopy linkLink copied to clipboard!

3.1. Available RHEL image types for public cloudCopy linkLink copied to clipboard!

3.2. Deploying a RHEL instance by using a custom base imageCopy linkLink copied to clipboard!

3.3. Uploading and running a RHEL instance on Google CloudCopy linkLink copied to clipboard!

3.3.1. Installing the Google Cloud SDKCopy linkLink copied to clipboard!

3.3.2. Creating a new project on Google CloudCopy linkLink copied to clipboard!

3.3.3. Creating and uploading a RHEL image on Google Cloud StorageCopy linkLink copied to clipboard!

3.3.4. Launching and connecting to a RHEL Google Compute Engine instanceCopy linkLink copied to clipboard!

3.4. Attaching Red Hat subscriptionsCopy linkLink copied to clipboard!

3.5. Setting up automatic registration on Google Cloud gold imagesCopy linkLink copied to clipboard!

Chapter 4. Configuring a Red Hat High Availability Cluster on Google CloudCopy linkLink copied to clipboard!

4.1. Benefits of using high-availability clusters on public cloud platformsCopy linkLink copied to clipboard!

4.2. Required system packagesCopy linkLink copied to clipboard!

4.3. Creating a custom virtual private cloud network and subnetCopy linkLink copied to clipboard!

4.4. Creating and configuring a base Google Cloud instanceCopy linkLink copied to clipboard!

4.5. Creating a snapshot imageCopy linkLink copied to clipboard!

4.6. Creating an HA node template instance and HA nodesCopy linkLink copied to clipboard!

4.7. Installing the high availability packages and agents for Google CloudCopy linkLink copied to clipboard!

4.8. Creating a high availability clusterCopy linkLink copied to clipboard!

4.9. Configuring fencing in a Red Hat High Availability clusterCopy linkLink copied to clipboard!

4.9.1. Displaying available fence agents and their optionsCopy linkLink copied to clipboard!

4.9.2. Creating a fence deviceCopy linkLink copied to clipboard!

4.9.3. General properties of fencing devicesCopy linkLink copied to clipboard!

4.9.4. Fencing delaysCopy linkLink copied to clipboard!

4.9.5. Testing a fence deviceCopy linkLink copied to clipboard!

4.9.6. Configuring fencing levelsCopy linkLink copied to clipboard!

4.9.7. Removing a fence levelCopy linkLink copied to clipboard!

4.9.8. Clearing fence levelsCopy linkLink copied to clipboard!

4.9.9. Verifying nodes and devices in fence levelsCopy linkLink copied to clipboard!

4.9.10. Specifying nodes in fencing topologyCopy linkLink copied to clipboard!

4.9.11. Configuring fencing for redundant power suppliesCopy linkLink copied to clipboard!

4.9.12. Administering fence devicesCopy linkLink copied to clipboard!

4.9.12.1. Displaying configured fence devicesCopy linkLink copied to clipboard!

4.9.12.2. Exporting fence devices as pcs commandsCopy linkLink copied to clipboard!

4.9.12.3. Exporting fence level configurationCopy linkLink copied to clipboard!

4.9.12.4. Modifying and deleting fence devicesCopy linkLink copied to clipboard!

4.9.12.5. Manually fencing a cluster nodeCopy linkLink copied to clipboard!

4.9.12.6. Disabling a fence deviceCopy linkLink copied to clipboard!

4.9.12.7. Preventing a node from using a fencing deviceCopy linkLink copied to clipboard!

4.10. Configuring ACPI for use with integrated fence devicesCopy linkLink copied to clipboard!

4.10.1. Disabling ACPI Soft-Off with the BIOSCopy linkLink copied to clipboard!

4.10.2. Disabling ACPI Soft-Off in the logind.conf fileCopy linkLink copied to clipboard!

4.10.3. Disabling ACPI completely in the GRUB 2 fileCopy linkLink copied to clipboard!

4.11. Configuring the virtual IP management resource agentCopy linkLink copied to clipboard!

4.11.1. Configuring the primary subnet address rangeCopy linkLink copied to clipboard!

4.11.2. Configuring the secondary subnet address rangeCopy linkLink copied to clipboard!

Chapter 5. Configuring RHEL on Google Cloud with Secure BootCopy linkLink copied to clipboard!

5.1. Understanding Secure Boot for RHEL on cloudCopy linkLink copied to clipboard!

5.2. Configuring a RHEL instance from a custom RHEL image with Secure BootCopy linkLink copied to clipboard!

5.3. Configuring a RHEL instance on the Google Cloud Marketplace with Secure BootCopy linkLink copied to clipboard!

Chapter 6. Configuring RHEL on Public Cloud Platforms with Intel TDXCopy linkLink copied to clipboard!

6.1. Introduction to Intel TDXCopy linkLink copied to clipboard!

6.2. Understanding the Intel TDX secure boot processCopy linkLink copied to clipboard!

Chapter 7. Configuring a RHEL instance on Google Cloud with Intel TDXCopy linkLink copied to clipboard!

Chapter 8. Configuring RHEL on Public Cloud Platforms with AMD SEV SNPCopy linkLink copied to clipboard!

8.1. Introduction to AMD SEV SNPCopy linkLink copied to clipboard!

Chapter 9. Configuring a RHEL instance on Google Cloud with AMD SEV SNPCopy linkLink copied to clipboard!

Chapter 10. Configuring RHEL on Public Cloud Platforms with UKICopy linkLink copied to clipboard!

10.1. Introduction to Unified Kernel ImageCopy linkLink copied to clipboard!

10.2. Understanding the UKI secure boot processCopy linkLink copied to clipboard!

Legal NoticeCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Providing feedback on Red Hat documentation
Copy link

Chapter 1. Introducing RHEL on public cloud platforms
Copy link

1.1. Benefits of using RHEL in a public cloud
Copy link

1.2. Public cloud use cases for RHEL
Copy link

1.3. Frequent concerns when migrating to a public cloud
Copy link

1.4. Obtaining RHEL for public cloud deployments
Copy link

1.5. Methods for creating RHEL cloud instances
Copy link

Chapter 2. Preparing and uploading custom GCE images to Google Cloud
Copy link

2.1. Configuring and uploading a gce image to Google Cloud by using the CLI
Copy link

2.2. How RHEL image builder sorts the authentication order of different Google Cloud credentials
Copy link

Chapter 3. Deploying a RHEL image as a Google Compute Engine instance on Google Cloud
Copy link

3.1. Available RHEL image types for public cloud
Copy link

3.2. Deploying a RHEL instance by using a custom base image
Copy link

3.3. Uploading and running a RHEL instance on Google Cloud
Copy link

3.3.1. Installing the Google Cloud SDK
Copy link

3.3.2. Creating a new project on Google Cloud
Copy link

3.3.3. Creating and uploading a RHEL image on Google Cloud Storage
Copy link

3.3.4. Launching and connecting to a RHEL Google Compute Engine instance
Copy link

3.4. Attaching Red Hat subscriptions
Copy link

3.5. Setting up automatic registration on Google Cloud gold images
Copy link

Chapter 4. Configuring a Red Hat High Availability Cluster on Google Cloud
Copy link

4.1. Benefits of using high-availability clusters on public cloud platforms
Copy link

4.2. Required system packages
Copy link

4.3. Creating a custom virtual private cloud network and subnet
Copy link

4.4. Creating and configuring a base Google Cloud instance
Copy link

4.5. Creating a snapshot image
Copy link

4.6. Creating an HA node template instance and HA nodes
Copy link

4.7. Installing the high availability packages and agents for Google Cloud
Copy link

4.8. Creating a high availability cluster
Copy link

4.9. Configuring fencing in a Red Hat High Availability cluster
Copy link

4.9.1. Displaying available fence agents and their options
Copy link

4.9.2. Creating a fence device
Copy link

4.9.3. General properties of fencing devices
Copy link

4.9.4. Fencing delays
Copy link

4.9.5. Testing a fence device
Copy link

4.9.6. Configuring fencing levels
Copy link

4.9.7. Removing a fence level
Copy link

4.9.8. Clearing fence levels
Copy link

4.9.9. Verifying nodes and devices in fence levels
Copy link

4.9.10. Specifying nodes in fencing topology
Copy link

4.9.11. Configuring fencing for redundant power supplies
Copy link

4.9.12. Administering fence devices
Copy link

4.9.12.1. Displaying configured fence devices
Copy link

4.9.12.2. Exporting fence devices as pcs commands
Copy link

4.9.12.3. Exporting fence level configuration
Copy link

4.9.12.4. Modifying and deleting fence devices
Copy link

4.9.12.5. Manually fencing a cluster node
Copy link

4.9.12.6. Disabling a fence device
Copy link

4.9.12.7. Preventing a node from using a fencing device
Copy link

4.10. Configuring ACPI for use with integrated fence devices
Copy link

4.10.1. Disabling ACPI Soft-Off with the BIOS
Copy link

4.10.2. Disabling ACPI Soft-Off in the logind.conf file
Copy link

4.10.3. Disabling ACPI completely in the GRUB 2 file
Copy link

4.11. Configuring the virtual IP management resource agent
Copy link

4.11.1. Configuring the primary subnet address range
Copy link

4.11.2. Configuring the secondary subnet address range
Copy link

Chapter 5. Configuring RHEL on Google Cloud with Secure Boot
Copy link

5.1. Understanding Secure Boot for RHEL on cloud
Copy link

5.2. Configuring a RHEL instance from a custom RHEL image with Secure Boot
Copy link

5.3. Configuring a RHEL instance on the Google Cloud Marketplace with Secure Boot
Copy link

Chapter 6. Configuring RHEL on Public Cloud Platforms with Intel TDX
Copy link

6.1. Introduction to Intel TDX
Copy link

6.2. Understanding the Intel TDX secure boot process
Copy link

Chapter 7. Configuring a RHEL instance on Google Cloud with Intel TDX
Copy link

Chapter 8. Configuring RHEL on Public Cloud Platforms with AMD SEV SNP
Copy link

8.1. Introduction to AMD SEV SNP
Copy link

Chapter 9. Configuring a RHEL instance on Google Cloud with AMD SEV SNP
Copy link

Chapter 10. Configuring RHEL on Public Cloud Platforms with UKI
Copy link

10.1. Introduction to Unified Kernel Image
Copy link

10.2. Understanding the UKI secure boot process
Copy link

Legal Notice
Copy link