Este contenido no está disponible en el idioma seleccionado.

Chapter 7. Preparing for disaster recovery and recovering from data loss


Red Hat recommends preparing a disaster recovery plan to ensure the continuity of Satellite services in case of a disruptive event. These guidelines help ensure that you will be able to restore your Satellite deployment to an operational state after an incident.

7.2. Disaster recovery by virtualizing your Satellite Server

If you virtualize your Satellite Server and ensure that you take regular snapshots of the virtual machine (VM), you can respond to various disaster scenarios by restoring your Satellite deployment from one of your snapshots.

Note

The details for how to implement this scenario depend on your choice of a virtualization platform. Due to the variety of different hypervisors and their capabilities, Red Hat does not provide detailed instructions for any specific virtualization platform.

7.2.1. Prerequisites

Implement a reliable process for regularly taking VM snapshots of your virtualized Satellite Server and for backing up your snapshots for long-term storage.

Procedure

  1. Define a schedule for taking periodic snapshots of your virtualized Satellite Server. Consider your tolerance for potential data loss: Taking snapshots frequently will result in smaller amounts of data loss in case of a disaster. However, creating a snapshot takes time, and the snapshots also require storage space.
  2. Define your snapshot retention policy. Consider how many snapshots you want to store: Regularly removing outdated snapshots helps optimize storage usage.
  3. Using your hypervisor, schedule periodic snapshots of your Satellite Server.
  4. Schedule periodic backups of the snapshots to prevent data loss in case of hypervisor failure.

    Note

    While snapshots provide quick recovery points, backing up your snapshots gives you the ability for long-term storage and provides extra safety in case of a disaster on the side of your hypervisor.

  5. If you are using an external database that runs on a different machine than your Satellite Server, create snapshots and backups on the same schedule as your Satellite Server.

Verification

  1. Verify that your hypervisor takes the snapshots according to the schedule that you defined.
  2. Use the latest snapshot of your Satellite Server and restore it in an isolated environment.
  3. To verify that you will be able to restore Satellite services in case of a disaster, assess the functionality of the test Satellite Server. See Section 7.2.4, “Retrieving the status of services”.
  4. Perform these verification checks regularly.

In case of a disaster, use a virtual machine (VM) snapshot of your Satellite Server to restore Satellite services.

Important

Ensure that the hostname of your Satellite Server does not change during recovery. The IP address can change.

Procedure

  1. Identify the snapshot from which you want to recover.
  2. Use hypervisor tools to restore from the selected snapshot.
  3. If you are using an external database that runs on a different machine than your Satellite Server, ensure that you restore the database from a snapshot taken at the same time as or before the Satellite Server snapshot.
  4. Update DNS records so that the Satellite Server hostname resolves to the new IP address. This redirects traffic from the old server to the new server and you will not need to re-register your hosts.

Verification

7.2.4. Retrieving the status of services

Satellite uses a set of back-end services. When troubleshooting, you can check the status of Satellite services.

Procedure

  • In the Satellite web UI, navigate to Administer > About.

    • On the Smart Proxies tab, view the status of all Capsules.
    • On the Compute Resources tab, view the status of attached compute resource providers.
    • In the Backend System Status table, view the status of all back-end services.

CLI procedure

  • Get information from the database and Satellite services:

    # hammer ping
    Copy to Clipboard Toggle word wrap
  • Check the status of the services running in systemd:

    # satellite-maintain service status
    Copy to Clipboard Toggle word wrap

    Run satellite-maintain service --help for more information.

  • Perform a health check:

    $ satellite-maintain health check
    Copy to Clipboard Toggle word wrap

    Run satellite-maintain health --help for more information.

To prepare for disaster recovery, you can configure two Satellite Servers and store critical data externally on shared storage. The primary server is active while the secondary server remains passive. If the primary server fails, the shared storage is attached to your secondary server, which turns the secondary server into your new primary server.

7.3.1. Prerequisites

Create your passive Satellite Server as a clone of your active Satellite Server. Ensure that the /var/lib/pulp and /var/lib/pgsql directories on your shared storage are available to both servers.

Procedure

  1. Replicate the /var/lib/pulp and /var/lib/pgsql directories from the active Satellite Server to your shared storage.
  2. Clone your active Satellite Server. For more information, see Chapter 3, Cloning Satellite Server.
  3. Keep the source server powered on. Power off the new server.

    The source server remains your active primary server, while the new server becomes the passive secondary server.

  4. Determine how you want to attach the database content on the shared storage to your passive server:

    • If you mount the storage directly on both your active and passive server, the servers will always see the same, up-to-date content.
    • If you mount the storage only on your active server, the passive server will access the data only if it takes over as the active server.

Verification

Perform this test in an isolated staging environment:

  1. Mimic a full outage on the active server. To make sure the active server is inaccessible, you can turn the machine off, halt the virtual machine (VM) if your server runs on a VM, or isolate the machine by using a firewall.
  2. Switch DNS records of the active server with the DNS records of the passive server.
  3. Verify that your passive server can access the data stored on your shared storage.
  4. Assess the functionality of the test Satellite Server. For more information, see Section 7.3.4, “Retrieving the status of services”.
  5. Perform these verification checks regularly.

Additional resources

If your active Satellite Server fails, detach it from the shared storage and make sure your passive server can access the data stored on the shared storage. This turns the passive server into your new active server.

Procedure

  1. Verify that the failed active server is powered off or fully detached from the shared storage. This ensures that the failed server cannot keep writing to the shared storage.
  2. Switch DNS records of the active server with the DNS records of the passive server. This ensures that hosts remain connected and you do not need to re-register them.
  3. If your shared storage was mounted on both your active and passive servers, your passive server can already access the data.
  4. If your shared storage was mounted only on your active server, re-mount it on your passive server.
  5. Assess the functionality of your new active Satellite Server. For more information, see Section 7.3.4, “Retrieving the status of services”.

7.3.4. Retrieving the status of services

Satellite uses a set of back-end services. When troubleshooting, you can check the status of Satellite services.

Procedure

  • In the Satellite web UI, navigate to Administer > About.

    • On the Smart Proxies tab, view the status of all Capsules.
    • On the Compute Resources tab, view the status of attached compute resource providers.
    • In the Backend System Status table, view the status of all back-end services.

CLI procedure

  • Get information from the database and Satellite services:

    # hammer ping
    Copy to Clipboard Toggle word wrap
  • Check the status of the services running in systemd:

    # satellite-maintain service status
    Copy to Clipboard Toggle word wrap

    Run satellite-maintain service --help for more information.

  • Perform a health check:

    $ satellite-maintain health check
    Copy to Clipboard Toggle word wrap

    Run satellite-maintain health --help for more information.

To prepare for disaster recovery, you can configure two Satellite Servers: an active primary server and a passive secondary server. You configure periodic backups of the primary server. If the primary server fails, you can restore a backup on the secondary server to turn it into your new primary server.

7.4.1. Prerequisites

Create your passive Satellite Server by restoring a backup of your active Satellite Server. Configure periodic backups of the active server.

Procedure

  1. Define a schedule for periodic offline backups of your active Satellite Server. Consider your tolerance for potential data loss and your storage options: Taking backups frequently will result in smaller amounts of data loss in case of a disaster, but backups require a significant amount of storage space. For information about the size of Satellite backups, see Section 12.1, “Estimating the size of a backup”.

    You can combine full backups with incremental backups. For an example of a cron job that ensures regular backups, see Section 7.4.5, “Example of a weekly full backup followed by daily incremental backups”.

  2. Schedule periodic offline backups of your active Satellite Server to be taken according to the schedule you defined. For information about performing backups, see Chapter 12, Backing up Satellite Server and Capsule Server.
  3. Ensure that the backup directories are encrypted and regularly synchronized to a secure location. By default, Satellite stores the backups in the /var/satellite-backup directory.

    Important

    Satellite Server backups contain sensitive information from the /root/ssl-build directory. For example, they can contain hostnames, ssh keys, request files, and SSL certificates. Encrypting or moving the backups to a secure location helps minimize the risk of damage or unauthorized access to the hosts.

  4. Restore the most recent backup on a system that will serve as your passive Satellite Server. For information about restoring backups, see Chapter 13, Restoring Satellite Server or Capsule Server from a backup.
  5. Optional: Automate backup restoration to keep the passive server periodically updated with the latest backup. A regularly restored passive server helps reduce switchover time if the active server fails.

    Consider how often you want the backups to be restored: More frequent updates reduce potential data loss but increase infrastructure and automation costs.

  6. Power off the passive server. Keep the active server powered on.
  7. Define your backup retention policy. Consider how many backups you want to store: Regularly removing outdated backups helps optimize storage usage.

Verification

  1. Verify that Satellite takes backups according to the schedule you defined.
  2. Perform further testing steps in an isolated staging environment:

    1. Mimic a full outage on the active server. To make sure the active server is inaccessible, you can turn the machine off, halt the virtual machine (VM) if your server runs on a VM, or isolate the machine by using a firewall.
    2. Switch DNS records of the active server with the DNS records of the passive server.
    3. Assess the functionality of the test Satellite Server. For more information, see Section 7.4.4, “Retrieving the status of services”.
    4. Perform these verification checks regularly.

If your active Satellite Server fails, activate your passive secondary server.

Procedure

  1. Verify that the failed active server is powered off and that backups are no longer being synchronized to your passive server.
  2. Switch DNS records of the active server with the DNS records of the passive server. This ensures that hosts remain connected and you do not need to re-register them.
  3. Assess the functionality of your new active Satellite Server. For more information, see Section 7.4.4, “Retrieving the status of services”.

7.4.4. Retrieving the status of services

Satellite uses a set of back-end services. When troubleshooting, you can check the status of Satellite services.

Procedure

  • In the Satellite web UI, navigate to Administer > About.

    • On the Smart Proxies tab, view the status of all Capsules.
    • On the Compute Resources tab, view the status of attached compute resource providers.
    • In the Backend System Status table, view the status of all back-end services.

CLI procedure

  • Get information from the database and Satellite services:

    # hammer ping
    Copy to Clipboard Toggle word wrap
  • Check the status of the services running in systemd:

    # satellite-maintain service status
    Copy to Clipboard Toggle word wrap

    Run satellite-maintain service --help for more information.

  • Perform a health check:

    $ satellite-maintain health check
    Copy to Clipboard Toggle word wrap

    Run satellite-maintain health --help for more information.

The following script performs a full backup on a Sunday followed by incremental backups for each of the following days. A new subdirectory is created for each day that an incremental backup is performed. The script requires a daily cron job.

#!/bin/bash -e
PATH=/sbin:/bin:/usr/sbin:/usr/bin
DESTINATION=/var/backup_directory
if [[ $(date +%w) == 0 ]]; then
  satellite-maintain backup offline --assumeyes $DESTINATION
else
  LAST=$(ls -td -- $DESTINATION/*/ | head -n 1)
  satellite-maintain backup offline --assumeyes --incremental "$LAST" $DESTINATION
fi
exit 0
Copy to Clipboard Toggle word wrap

Note that the satellite-maintain backup command requires /sbin and /usr/sbin directories to be in PATH and the --assumeyes option is used to skip the confirmation prompt.

7.5. Disaster recovery with two active Satellite Servers

To prepare for disaster recovery, you can configure two Satellite Servers and operate each server in a different data center. If one of the servers fails, you can re-register all hosts from the failed server to the other server.

7.5.1. Prerequisites

Create a second Satellite Server by restoring a backup of your first Satellite Server. Configure both servers to operate independently in their respective data centers, but ensure that their content does not drift apart over time.

Procedure

  1. Back up your Satellite Server. For more information, see Chapter 12, Backing up Satellite Server and Capsule Server.
  2. Restore the backup on a system that will serve as your other Satellite Server. For more information, see Chapter 13, Restoring Satellite Server or Capsule Server from a backup.

    Note

    Each server must have a distinct hostname and IP address. This enables you to re-register hosts if one of the servers fails.

  3. Ensure that content on your servers remains consistent:

    • If you want both servers to manage content synchronization and content view creation, follow these guidelines to prevent content drift:

      • Regularly synchronize repositories on both servers. You can use the following Ansible modules to automate repository synchronization: redhat.satellite.repository_sync and redhat.satellite.sync_plan.
      • Ensure that content views on both servers match.
    • If you want one server to manage content synchronization and content view creation, use one of these features to prevent content drift:

      • If your disaster recovery site has network access to your primary site, use Inter-Satellite Synchronization (ISS) to ensure your disaster recovery server synchronizes its content from the primary server.
      • If your disaster recovery site does not have network access to your primary site, synchronize content by using export and import.
    • If you want one server to manage only content view creation but not content synchronization, you can configure the other server or multiple other servers to import content views from the first server but synchronize content from repositories.
  4. Register hosts to your servers so that each server manages hosts in its respective data center. For example, register all hosts in My_Data_Center_1 to one Satellite Server and all hosts in My_Data_Center_2 to the other Satellite Server.
  5. Automate running the satellite-maintain health check command on both servers. The health check verifies whether the servers remain fully operational.

Verification

Perform this test in an isolated staging environment:

  1. Mimic a full outage on one of your servers. To verify that the server is inaccessible, you can turn the machine off, halt the virtual machine (VM) if your server runs on a VM, or isolate the machine by using a firewall.
  2. Verify that your satellite-maintain health check automation reported an error.
  3. Re-register all hosts from the inaccessible server to the accessible server.
  4. Verify that hosts have been properly re-registered to the accessible server.
  5. Perform these verification checks regularly.

Additional resources

7.5.3. Recovering from disaster with two active Satellite Servers

If the health checks implemented in Section 7.5.2, “Preparing for disaster recovery with two active Satellite Servers” report an issue on one of your Satellite Servers, it might mean that the server has failed. If the server is down, you must re-register hosts to the other server.

Procedure

  1. Verify the status of the server:

    # satellite-maintain health check
    Copy to Clipboard Toggle word wrap
  2. If satellite-maintain health check reported a problem, ensure that the server is powered off.
  3. Re-register all hosts from the data center managed by the failed server to the other, functional server.

Verification

  • Verify that hosts have been properly re-registered.

Additional resources

  • Ansible playbooks can help you automate failover, re-registration, and synchronization. For more information, see Managing Satellite with Ansible in Administering Red Hat Satellite.
Volver arriba
Red Hat logoGithubredditYoutubeTwitter

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Ayudamos a los usuarios de Red Hat a innovar y alcanzar sus objetivos con nuestros productos y servicios con contenido en el que pueden confiar. Explore nuestras recientes actualizaciones.

Hacer que el código abierto sea más inclusivo

Red Hat se compromete a reemplazar el lenguaje problemático en nuestro código, documentación y propiedades web. Para más detalles, consulte el Blog de Red Hat.

Acerca de Red Hat

Ofrecemos soluciones reforzadas que facilitan a las empresas trabajar en plataformas y entornos, desde el centro de datos central hasta el perímetro de la red.

Theme

© 2025 Red Hat