Este contenido no está disponible en el idioma seleccionado.
Chapter 7. Preparing for disaster recovery and recovering from data loss
Red Hat recommends preparing a disaster recovery plan to ensure the continuity of Satellite services in case of a disruptive event. These guidelines help ensure that you will be able to restore your Satellite deployment to an operational state after an incident.
7.1. Overview of recommended disaster recovery plans Copiar enlaceEnlace copiado en el portapapeles!
Choose a disaster recovery plan that best helps ensure the continuity of Satellite services in your deployment.
- Snapshots of virtualized Satellite Server
- How do I back up?
- Virtualize your Satellite Server and use the hypervisor tools to take virtual machine snapshots of the server. This method is suitable if you can run Satellite in a virtual machine.
- How will I recover in case of a disruptive event?
- To recover Satellite services, restore a virtual machine snapshot.
- Disadvantages and expected impact
- Expect some amount of data inconsistency after recovery, based on how old your last snapshot is. You will lose data changes that have occurred since the snapshot you are using to recover was taken.
- Active and passive Satellite Server, with external storage
- How do I back up?
-
Store the following critical data on network attached storage: content in
/var/lib/pulp
and database in/var/lib/pgsql
. Replicate this storage into a different data center. Attach the storage to a Satellite Server that is a clone of the primary Satellite Server but runs passively. - How will I recover in case of a disruptive event?
- To recover Satellite services, switch DNS records of the active Satellite Server with the passive Satellite Server. This ensures that the passive server becomes the active server. All hosts remain connected without configuration updates.
- Disadvantages and expected impact
- If the network attached storage is replicated to another location, expect some amount of data inconsistency after recovery based on the synchronization interval.
- Active and passive Satellite Server, with backup and restore
- How do I back up?
- Ensure periodic backups of your Satellite Server. Copy this backup to a passive Satellite Server and restore it on the passive server.
- How will I recover in case of a disruptive event?
- To recover Satellite services, switch DNS records of the active Satellite Server with the passive Satellite Server. This ensures that the passive server becomes the active server. All hosts remain connected without configuration updates.
- Disadvantages and expected impact
- Expect some amount of data inconsistency after recovery, based on how often you took and restored backups and on how long it takes to complete the restore process.
- Dual active Satellite Server
- How do I back up?
Operate an active, independent Satellite Server per data center. Hosts from each data center are registered to the Satellite Server in that data center. Then configure automation to ensure recovery in case of a disruptive event. For example, you can periodically run a health check and if the health check discovers that the current Satellite Server a host is registered to does not resolve, the host is re-registered to the other Satellite Server.
To minimize downtime, you can automate the recovery in various ways. For example, you can use the Satellite Ansible collection. For more information, see Managing Satellite with Ansible in Administering Red Hat Satellite.
- How will I recover in case of a disruptive event?
- To recover Satellite services, re-register all hosts to the Satellite Server in the other data center.
- Disadvantages and expected impact
- You must ensure that content synchronization and content view creation are synchronized to create the same content view in each Satellite and prevent content drift. Content drift occurs when available content deviates from the intended state defined by a content view. If you fail to prevent content drift, expect inconsistency in the content that is available to hosts.
7.2. Disaster recovery by virtualizing your Satellite Server Copiar enlaceEnlace copiado en el portapapeles!
If you virtualize your Satellite Server and ensure that you take regular snapshots of the virtual machine (VM), you can respond to various disaster scenarios by restoring your Satellite deployment from one of your snapshots.
The details for how to implement this scenario depend on your choice of a virtualization platform. Due to the variety of different hypervisors and their capabilities, Red Hat does not provide detailed instructions for any specific virtualization platform.
7.2.1. Prerequisites Copiar enlaceEnlace copiado en el portapapeles!
- Review Section 7.1, “Overview of recommended disaster recovery plans” to make sure that this disaster recovery plan works for you.
- Your Satellite Server is deployed as a VM.
7.2.2. Preparing for disaster recovery by virtualizing your Satellite Server Copiar enlaceEnlace copiado en el portapapeles!
Implement a reliable process for regularly taking VM snapshots of your virtualized Satellite Server and for backing up your snapshots for long-term storage.
Procedure
- Define a schedule for taking periodic snapshots of your virtualized Satellite Server. Consider your tolerance for potential data loss: Taking snapshots frequently will result in smaller amounts of data loss in case of a disaster. However, creating a snapshot takes time, and the snapshots also require storage space.
- Define your snapshot retention policy. Consider how many snapshots you want to store: Regularly removing outdated snapshots helps optimize storage usage.
- Using your hypervisor, schedule periodic snapshots of your Satellite Server.
Schedule periodic backups of the snapshots to prevent data loss in case of hypervisor failure.
NoteWhile snapshots provide quick recovery points, backing up your snapshots gives you the ability for long-term storage and provides extra safety in case of a disaster on the side of your hypervisor.
- If you are using an external database that runs on a different machine than your Satellite Server, create snapshots and backups on the same schedule as your Satellite Server.
Verification
- Verify that your hypervisor takes the snapshots according to the schedule that you defined.
- Use the latest snapshot of your Satellite Server and restore it in an isolated environment.
- To verify that you will be able to restore Satellite services in case of a disaster, assess the functionality of the test Satellite Server. See Section 7.2.4, “Retrieving the status of services”.
- Perform these verification checks regularly.
7.2.3. Recovering from disaster by restoring a VM snapshot of Satellite Server Copiar enlaceEnlace copiado en el portapapeles!
In case of a disaster, use a virtual machine (VM) snapshot of your Satellite Server to restore Satellite services.
Ensure that the hostname of your Satellite Server does not change during recovery. The IP address can change.
Procedure
- Identify the snapshot from which you want to recover.
- Use hypervisor tools to restore from the selected snapshot.
- If you are using an external database that runs on a different machine than your Satellite Server, ensure that you restore the database from a snapshot taken at the same time as or before the Satellite Server snapshot.
- Update DNS records so that the Satellite Server hostname resolves to the new IP address. This redirects traffic from the old server to the new server and you will not need to re-register your hosts.
Verification
- Assess the functionality of your restored Satellite Server. See Section 7.2.4, “Retrieving the status of services”.
7.2.4. Retrieving the status of services Copiar enlaceEnlace copiado en el portapapeles!
Satellite uses a set of back-end services. When troubleshooting, you can check the status of Satellite services.
Procedure
In the Satellite web UI, navigate to Administer > About.
- On the Smart Proxies tab, view the status of all Capsules.
- On the Compute Resources tab, view the status of attached compute resource providers.
- In the Backend System Status table, view the status of all back-end services.
CLI procedure
Get information from the database and Satellite services:
hammer ping
# hammer ping
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the services running in systemd:
satellite-maintain service status
# satellite-maintain service status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run
satellite-maintain service --help
for more information.Perform a health check:
satellite-maintain health check
$ satellite-maintain health check
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run
satellite-maintain health --help
for more information.
7.3. Disaster recovery for active and passive Satellite Server with external storage Copiar enlaceEnlace copiado en el portapapeles!
To prepare for disaster recovery, you can configure two Satellite Servers and store critical data externally on shared storage. The primary server is active while the secondary server remains passive. If the primary server fails, the shared storage is attached to your secondary server, which turns the secondary server into your new primary server.
7.3.1. Prerequisites Copiar enlaceEnlace copiado en el portapapeles!
- Review Section 7.1, “Overview of recommended disaster recovery plans” to ensure that this disaster recovery plan works for you.
-
Review Storage requirements and Storage guidelines in Installing Satellite Server in a connected network environment. Ensure that your shared storage meets the requirements of holding the contents of
/var/lib/pulp
and/var/lib/pgsql
. - You have configured your Satellite Server to use external databases. For more information, see Using external databases in Installing Satellite Server in a connected network environment.
7.3.2. Preparing for disaster recovery with active and passive Satellite Server with external storage Copiar enlaceEnlace copiado en el portapapeles!
Create your passive Satellite Server as a clone of your active Satellite Server. Ensure that the /var/lib/pulp
and /var/lib/pgsql
directories on your shared storage are available to both servers.
Procedure
-
Replicate the
/var/lib/pulp
and/var/lib/pgsql
directories from the active Satellite Server to your shared storage. - Clone your active Satellite Server. For more information, see Chapter 3, Cloning Satellite Server.
Keep the source server powered on. Power off the new server.
The source server remains your active primary server, while the new server becomes the passive secondary server.
Determine how you want to attach the database content on the shared storage to your passive server:
- If you mount the storage directly on both your active and passive server, the servers will always see the same, up-to-date content.
- If you mount the storage only on your active server, the passive server will access the data only if it takes over as the active server.
Verification
Perform this test in an isolated staging environment:
- Mimic a full outage on the active server. To make sure the active server is inaccessible, you can turn the machine off, halt the virtual machine (VM) if your server runs on a VM, or isolate the machine by using a firewall.
- Switch DNS records of the active server with the DNS records of the passive server.
- Verify that your passive server can access the data stored on your shared storage.
- Assess the functionality of the test Satellite Server. For more information, see Section 7.3.4, “Retrieving the status of services”.
- Perform these verification checks regularly.
Additional resources
- For more information on mounting directories, see Mounting file systems on demand in Red Hat Enterprise Linux 9 Managing file systems.
7.3.3. Recovering from disaster with active and passive server with external storage Copiar enlaceEnlace copiado en el portapapeles!
If your active Satellite Server fails, detach it from the shared storage and make sure your passive server can access the data stored on the shared storage. This turns the passive server into your new active server.
Procedure
- Verify that the failed active server is powered off or fully detached from the shared storage. This ensures that the failed server cannot keep writing to the shared storage.
- Switch DNS records of the active server with the DNS records of the passive server. This ensures that hosts remain connected and you do not need to re-register them.
- If your shared storage was mounted on both your active and passive servers, your passive server can already access the data.
- If your shared storage was mounted only on your active server, re-mount it on your passive server.
- Assess the functionality of your new active Satellite Server. For more information, see Section 7.3.4, “Retrieving the status of services”.
7.3.4. Retrieving the status of services Copiar enlaceEnlace copiado en el portapapeles!
Satellite uses a set of back-end services. When troubleshooting, you can check the status of Satellite services.
Procedure
In the Satellite web UI, navigate to Administer > About.
- On the Smart Proxies tab, view the status of all Capsules.
- On the Compute Resources tab, view the status of attached compute resource providers.
- In the Backend System Status table, view the status of all back-end services.
CLI procedure
Get information from the database and Satellite services:
hammer ping
# hammer ping
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the services running in systemd:
satellite-maintain service status
# satellite-maintain service status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run
satellite-maintain service --help
for more information.Perform a health check:
satellite-maintain health check
$ satellite-maintain health check
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run
satellite-maintain health --help
for more information.
7.4. Disaster recovery for active and passive Satellite Server with backup and restore Copiar enlaceEnlace copiado en el portapapeles!
To prepare for disaster recovery, you can configure two Satellite Servers: an active primary server and a passive secondary server. You configure periodic backups of the primary server. If the primary server fails, you can restore a backup on the secondary server to turn it into your new primary server.
7.4.1. Prerequisites Copiar enlaceEnlace copiado en el portapapeles!
- Review Section 7.1, “Overview of recommended disaster recovery plans” to ensure that this disaster recovery plan works for you.
- You have a Satellite Server installed.
7.4.2. Preparing for disaster recovery with active and passive Satellite Server and backup and restore Copiar enlaceEnlace copiado en el portapapeles!
Create your passive Satellite Server by restoring a backup of your active Satellite Server. Configure periodic backups of the active server.
Procedure
Define a schedule for periodic offline backups of your active Satellite Server. Consider your tolerance for potential data loss and your storage options: Taking backups frequently will result in smaller amounts of data loss in case of a disaster, but backups require a significant amount of storage space. For information about the size of Satellite backups, see Section 12.1, “Estimating the size of a backup”.
You can combine full backups with incremental backups. For an example of a
cron
job that ensures regular backups, see Section 7.4.5, “Example of a weekly full backup followed by daily incremental backups”.- Schedule periodic offline backups of your active Satellite Server to be taken according to the schedule you defined. For information about performing backups, see Chapter 12, Backing up Satellite Server and Capsule Server.
Ensure that the backup directories are encrypted and regularly synchronized to a secure location. By default, Satellite stores the backups in the
/var/satellite-backup
directory.ImportantSatellite Server backups contain sensitive information from the
/root/ssl-build
directory. For example, they can contain hostnames, ssh keys, request files, and SSL certificates. Encrypting or moving the backups to a secure location helps minimize the risk of damage or unauthorized access to the hosts.- Restore the most recent backup on a system that will serve as your passive Satellite Server. For information about restoring backups, see Chapter 13, Restoring Satellite Server or Capsule Server from a backup.
Optional: Automate backup restoration to keep the passive server periodically updated with the latest backup. A regularly restored passive server helps reduce switchover time if the active server fails.
Consider how often you want the backups to be restored: More frequent updates reduce potential data loss but increase infrastructure and automation costs.
- Power off the passive server. Keep the active server powered on.
- Define your backup retention policy. Consider how many backups you want to store: Regularly removing outdated backups helps optimize storage usage.
Verification
- Verify that Satellite takes backups according to the schedule you defined.
Perform further testing steps in an isolated staging environment:
- Mimic a full outage on the active server. To make sure the active server is inaccessible, you can turn the machine off, halt the virtual machine (VM) if your server runs on a VM, or isolate the machine by using a firewall.
- Switch DNS records of the active server with the DNS records of the passive server.
- Assess the functionality of the test Satellite Server. For more information, see Section 7.4.4, “Retrieving the status of services”.
- Perform these verification checks regularly.
7.4.3. Recovering from disaster with active and passive server and backup and restore Copiar enlaceEnlace copiado en el portapapeles!
If your active Satellite Server fails, activate your passive secondary server.
Procedure
- Verify that the failed active server is powered off and that backups are no longer being synchronized to your passive server.
- Switch DNS records of the active server with the DNS records of the passive server. This ensures that hosts remain connected and you do not need to re-register them.
- Assess the functionality of your new active Satellite Server. For more information, see Section 7.4.4, “Retrieving the status of services”.
7.4.4. Retrieving the status of services Copiar enlaceEnlace copiado en el portapapeles!
Satellite uses a set of back-end services. When troubleshooting, you can check the status of Satellite services.
Procedure
In the Satellite web UI, navigate to Administer > About.
- On the Smart Proxies tab, view the status of all Capsules.
- On the Compute Resources tab, view the status of attached compute resource providers.
- In the Backend System Status table, view the status of all back-end services.
CLI procedure
Get information from the database and Satellite services:
hammer ping
# hammer ping
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the status of the services running in systemd:
satellite-maintain service status
# satellite-maintain service status
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run
satellite-maintain service --help
for more information.Perform a health check:
satellite-maintain health check
$ satellite-maintain health check
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run
satellite-maintain health --help
for more information.
7.4.5. Example of a weekly full backup followed by daily incremental backups Copiar enlaceEnlace copiado en el portapapeles!
The following script performs a full backup on a Sunday followed by incremental backups for each of the following days. A new subdirectory is created for each day that an incremental backup is performed. The script requires a daily cron job.
Note that the satellite-maintain backup
command requires /sbin
and /usr/sbin
directories to be in PATH
and the --assumeyes
option is used to skip the confirmation prompt.
7.5. Disaster recovery with two active Satellite Servers Copiar enlaceEnlace copiado en el portapapeles!
To prepare for disaster recovery, you can configure two Satellite Servers and operate each server in a different data center. If one of the servers fails, you can re-register all hosts from the failed server to the other server.
7.5.1. Prerequisites Copiar enlaceEnlace copiado en el portapapeles!
- Review Section 7.1, “Overview of recommended disaster recovery plans” to ensure that this disaster recovery plan works for you.
- You have a Satellite Server installed.
7.5.2. Preparing for disaster recovery with two active Satellite Servers Copiar enlaceEnlace copiado en el portapapeles!
Create a second Satellite Server by restoring a backup of your first Satellite Server. Configure both servers to operate independently in their respective data centers, but ensure that their content does not drift apart over time.
Procedure
- Back up your Satellite Server. For more information, see Chapter 12, Backing up Satellite Server and Capsule Server.
Restore the backup on a system that will serve as your other Satellite Server. For more information, see Chapter 13, Restoring Satellite Server or Capsule Server from a backup.
NoteEach server must have a distinct hostname and IP address. This enables you to re-register hosts if one of the servers fails.
Ensure that content on your servers remains consistent:
If you want both servers to manage content synchronization and content view creation, follow these guidelines to prevent content drift:
-
Regularly synchronize repositories on both servers. You can use the following Ansible modules to automate repository synchronization:
redhat.satellite.repository_sync
andredhat.satellite.sync_plan
. - Ensure that content views on both servers match.
-
Regularly synchronize repositories on both servers. You can use the following Ansible modules to automate repository synchronization:
If you want one server to manage content synchronization and content view creation, use one of these features to prevent content drift:
- If your disaster recovery site has network access to your primary site, use Inter-Satellite Synchronization (ISS) to ensure your disaster recovery server synchronizes its content from the primary server.
- If your disaster recovery site does not have network access to your primary site, synchronize content by using export and import.
- If you want one server to manage only content view creation but not content synchronization, you can configure the other server or multiple other servers to import content views from the first server but synchronize content from repositories.
- Register hosts to your servers so that each server manages hosts in its respective data center. For example, register all hosts in My_Data_Center_1 to one Satellite Server and all hosts in My_Data_Center_2 to the other Satellite Server.
-
Automate running the
satellite-maintain health check
command on both servers. The health check verifies whether the servers remain fully operational.
Verification
Perform this test in an isolated staging environment:
- Mimic a full outage on one of your servers. To verify that the server is inaccessible, you can turn the machine off, halt the virtual machine (VM) if your server runs on a VM, or isolate the machine by using a firewall.
-
Verify that your
satellite-maintain health check
automation reported an error. - Re-register all hosts from the inaccessible server to the accessible server.
- Verify that hosts have been properly re-registered to the accessible server.
- Perform these verification checks regularly.
Additional resources
- Ansible playbooks can help you automate failover, re-registration, and synchronization. For more information, see Managing Satellite with Ansible in Administering Red Hat Satellite.
- For more information on synchronizing repositories, see Synchronizing repositories in Managing content.
- For more information on synchronizing content between Satellite Servers, including ISS, export, and import, see Synchronizing content between Satellite Servers in Managing content.
7.5.3. Recovering from disaster with two active Satellite Servers Copiar enlaceEnlace copiado en el portapapeles!
If the health checks implemented in Section 7.5.2, “Preparing for disaster recovery with two active Satellite Servers” report an issue on one of your Satellite Servers, it might mean that the server has failed. If the server is down, you must re-register hosts to the other server.
Procedure
Verify the status of the server:
satellite-maintain health check
# satellite-maintain health check
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
If
satellite-maintain health check
reported a problem, ensure that the server is powered off. - Re-register all hosts from the data center managed by the failed server to the other, functional server.
Verification
- Verify that hosts have been properly re-registered.
Additional resources
- Ansible playbooks can help you automate failover, re-registration, and synchronization. For more information, see Managing Satellite with Ansible in Administering Red Hat Satellite.