3.8. Backing up and restoring PostgreSQL data with continuous archiving
You can use continuous archiving to create robust PostgreSQL backups by combining WAL files with base backups for point-in-time recovery and high availability.
PostgreSQL records every change to the database’s data files to a write-ahead log (WAL) file that is available in the pg_wal/ subdirectory of the cluster’s data directory. This log is intended primarily for a crash recovery. After a crash, you can use log entries made since the last checkpoint to restore the database to a consistent state.
The continuous archiving method, also known as an online backup, combines the WAL files with a copy of the database cluster in the form of a base backup performed on a running server or a file system level backup.
If a database recovery is needed, you can restore the database from the copy of the database cluster and then replay the log from the backed up WAL files to bring the system to the current state.
With the continuous archiving method, you must keep a continuous sequence of all archived WAL files that extends at minimum back to the start time of your last base backup. Therefore the ideal frequency of base backups depends on:
- The storage volume available for archived WAL files.
- The maximum possible duration of data recovery in situations when recovery is necessary. In cases with a long period since the last backup, the system replays more WAL segments, and the recovery therefore takes more time.
You cannot use pg_dump and pg_dumpall SQL dumps as a part of a continuous archiving backup solution. SQL dumps produce logical backups and do not contain enough information to be used by a WAL replay.
3.8.1. Advantages and disadvantages of continuous archiving リンクのコピーリンクがクリップボードにコピーされました!
Continuous archiving is a feature that provides a robust strategy for data backup, high availability, and point-in-time recovery (PITR) by continuously saving the database’s transaction log files.
Advantages:
- With the continuous backup method, it is possible to use a base backup that is not entirely consistent because any internal inconsistency in the backup is corrected by the log replay. Therefore you can perform a base backup on a running PostgreSQL server.
-
A file system snapshot is not needed;
taror a similar archiving utility is sufficient. - Continuous backup can be achieved by continuing to archive the WAL files because the sequence of WAL files for the log replay can be indefinitely long. This is particularly valuable for large databases.
- Continuous backup supports point-in-time recovery. It is not necessary to replay the WAL entries to the end. The replay can be stopped at any point and the database can be restored to its state at any time since the base backup was taken.
- If the series of WAL files are continuously available to another machine that has been loaded with the same base backup file, it is possible to restore the other machine with a nearly-current copy of the database at any point.
Disadvantages:
- Continuous backup method supports only restoration of an entire database cluster, not a subset.
- Continuous backup requires extensive archival storage.
3.8.2. Setting up WAL archiving リンクのコピーリンクがクリップボードにコピーされました!
You can enable write ahead log (WAL) archiving on your PostgreSQL server to capture and save WAL segment files for backup and point-in-time recovery purposes.
A running PostgreSQL server produces a sequence of WAL records. The server physically divides this sequence into WAL segment files, which are given numeric names that reflect their position in the WAL sequence. Without WAL archiving, the segment files are reused and renamed to higher segment numbers.
When archiving WAL data, the contents of each segment file are captured and saved at a new location before the segment file is reused. You have multiple options where to save the content, such as an NFS-mounted directory on another machine, a tape drive, or a CD.
Note that WAL records do not include changes to configuration files.
Procedure
In the
/var/lib/pgsql/data/postgresql.conffile:-
Set the
wal_levelconfiguration parameter toreplicaor higher. -
Set the
archive_modeparameter toon. Specify the shell command in the
archive_commandconfiguration parameter. You can use thecpcommand, another command, or a shell script.For example:
archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'where the
%pparameter is replaced by the relative path to the file to archive and the%fparameter is replaced by the file name.This command copies archivable WAL segments to the
/mnt/server/archivedir/directory. After replacing the%pand%fparameters, the executed command looks as follows:test ! -f /mnt/server/archivedir/00000001000000A900000065 && cp pg_wal/00000001000000A900000065 /mnt/server/archivedir/00000001000000A900000065A similar command is generated for each new file that is archived.
注記The archive command is executed only on completed WAL segments. A server that generates little WAL traffic can have a substantial delay between the completion of a transaction and its safe recording in archive storage. To limit how old unarchived data can be, you can:
-
Set the
archive_timeoutparameter to force the server to switch to a new WAL segment file with a given frequency. -
Use the
pg_switch_walparameter to force a segment switch to ensure that a transaction is archived immediately after it finishes.
-
Set the
-
Set the
Restart the
postgresqlservice to enable the changes:# systemctl restart postgresql.service- Test your archive command and ensure it does not overwrite an existing file and that it returns a nonzero exit status if it fails.
- To protect your data, ensure that the segment files are archived into a directory that does not have group or world read access.
3.8.3. Making a base backup リンクのコピーリンクがクリップボードにコピーされました!
You can create a PostgreSQL base backup by using the pg_basebackup utility to capture a consistent snapshot of your database for backup and recovery purposes.
The base backup process creates a backup history file that is stored into the WAL archive area and is named after the first WAL segment file that you need for the base backup.
The backup history file is a small text file containing the starting and ending times, and WAL segments of the backup. If you used the label string to identify the associated dump file, you can use the backup history file to determine which dump file to restore.
Consider keeping several backup sets to be certain that you can recover your data.
Prerequisites
-
You are logged in as the
postgressuperuser, a user with database administrator privileges, or another user with at leastREPLICATIONpermissions. - You must keep all the WAL segment files generated during and after the base backup.
Procedure
Use the
pg_basebackuputility to perform the base backup.To create a base backup as individual files (plain format):
$ pg_basebackup -D <backup_directory> -FpReplace backup_directory with your chosen backup location.
If you use tablespaces and perform the base backup on the same host as the server, you must also use the
--tablespace-mappingoption, otherwise the backup will fail upon an attempt to write the backup to the same location.To create a base backup as a
tararchive (tarand compressed format):$ pg_basebackup -D <backup_directory> -Ft -zReplace backup_directory with your chosen backup location.
To restore such data, you must manually extract the files in the correct locations.
To specify which database server pg_basebackup will contact, use the following command-line options:
The
-hoption to define the host.The default host is either the local host or a host specified by the
PGHOSTenvironment variable.The
-poption to define the port.The default port is indicated by the
PGPORTenvironment variable or the compiled-in default.
- After the base backup process is complete, safely archive the copy of the database cluster and the WAL segment files used during the backup, which are specified in the backup history file.
- Delete WAL segments numerically lower than the WAL segment files used in the base backup because these are older than the base backup and no longer needed for a restore.
3.8.4. Restoring the database by using a continuous archive backup リンクのコピーリンクがクリップボードにコピーされました!
You can restore a PostgreSQL database by restoring the base backup and applying archived WAL files for point-in-time recovery.
Procedure
Stop the server:
# systemctl stop postgresql.serviceCopy the necessary data to a temporary location.
Preferably, copy the whole cluster data directory and any tablespaces. Note that this requires enough free space on your system to hold two copies of your existing database.
If you do not have enough space, save the contents of the cluster’s
pg_waldirectory, which can contain logs that were not archived before the system went down.- Remove all existing files and subdirectories under the cluster data directory and under the root directories of any tablespaces you are using.
Restore the database files from your base backup.
Ensure that:
-
The files are restored with the correct ownership (the database system user, not
root). - The files are restored with the correct permissions.
-
The symbolic links in the
pg_tblspc/subdirectory are restored correctly.
-
The files are restored with the correct ownership (the database system user, not
Remove any files present in the
pg_wal/subdirectory.These files resulted from the base backup and are therefore obsolete. If you did not archive
pg_wal/, re-create it with proper permissions.-
Copy any unarchived WAL segment files that you saved in step 2 into
pg_wal/. Create the
recovery.confrecovery command file in the cluster data directory and specify the shell command in therestore_commandconfiguration parameter. You can use thecpcommand, another command, or a shell script. For example:restore_command = 'cp /mnt/server/archivedir/%f "%p"'Start the server:
# systemctl start postgresql.serviceThe server enters the recovery mode and proceeds to read through the archived WAL files that it needs.
If the recovery is terminated due to an external error, the server can be restarted, and it will continue the recovery. When the recovery process is completed, the server renames
recovery.conftorecovery.done. This prevents the server from accidentally re-entering the recovery mode after it starts normal database operations.Check the contents of the database to verify that the database has recovered into the required state.
If the database has not recovered into the required state, return to step 1. If the database has recovered into the required state, allow the users to connect by restoring the client authentication configuration in the
pg_hba.conffile.