2.6. Recovery
Overview
KahaDB supports a variety of mechanisms that enable it to recover and restart after a disorderly shutdown (system failure). This includes features to detect missing data files and to restore corrupted metadata. These features on their own, however, are not sufficient to guard completely against loss of data in the event of a system failure. If your broker is expected to mediate critical data, it is recommended that you deploy a disaster recovery system, such as a RAID disk array, to protect your data.
Clean shutdown
When the broker shuts down normally, the KahaDB message store flushes its cached data (representing the final state of the broker) to the file system. Specifically, the following information is written to the file system:
- All of the outstanding journal entries.
- All of the cached metadata.
Because this data represents the final state of the broker, the metadata store and the journal's data logs are consistent with each other after shutdown is complete. That is, the stored metadata takes into account all the commands recorded in the journal.
Recovery from disorderly shutdown
Normally, the journal tends to run ahead of the metadata store, because the journal is constantly being updated, whereas the metadata store is written only periodically (for example, whenever there is a checkpoint). Consequently, whenever there is a disorderly shutdown (which prevents the final state of the broker from being saved), it is likely that the stored metadata will be inconsistent with the journal, with the journal containing additional events not reflected in the metadata store.
When the broker restarts after a disorderly shutdown, the KahaDB message store recovers by reading the stored metadata into the cache and then reading the additional journal events not yet taken into account in the stored metadata (KahaDB can easily locate the additional journal events, because the metadata store always holds a reference to the last consistent location in the journal). KahaDB replays the additional journal events in order to recreate the original metadata.
Note
The KahaDB message store also uses a redo log,
db.redo
, to reduce the risk of a system failure occurring while updating the metadata store. Before updating the metadata store, KahaDB always saves the redo log, which summarizes the changes that are about to be made to the store. Because the redo log is a small file, it can be written relatively rapidly and is thus less likely to be affected by a system failure. During recovery, KahaDB checks whether the changes recorded in the redo log need to be applied to the metadata.
Forcing recovery by deleting the metadata store
If the metadata store somehow becomes irretrievably corrupted, you can force recovery as follows (assuming the journal's data logs are clean):
- While the broker is shut down, delete the metadata store,
db.data
. - Start the broker.
- The broker now recovers by re-reading the entire journal and replaying all of the events in the journal in order to recreate the missing metadata.
While this is an effective means of recovering, you should bear in mind that it could take a considerable length of time if the journal is large.
Missing journal files
KahaDB has the ability to detect when journal files are missing. If one or more journal files are detected to be missing, the default behavior is for the broker to raise an exception and shut down. This gives an administrator the opportunity to investigate what happened to the missing journal files and to restore them manually, if necessary.
If you want the broker to ignore any missing journal files and continue processing regardless, you can set the
ignoreMissingJournalfiles
property to true
.
Checking for corrupted journal files
KahaDB has a feature that checks for corrupted journal files, but this feature must be explicitly enabled. Example 2.3, “Configuration for Journal Validation” shows how to configure a KahaDB message store to detect corrupted journal files.
Example 2.3. Configuration for Journal Validation
<persistenceAdapter> <kahaDB directory="activemq-data" journalMaxFileLength="32mb" checksumJournalFiles="true" checkForCorruptJournalFiles="true" /> </persistenceAdapter>