From Cyrus
Index Checksums
Executive Summary
- We have had occasional issues with both cyrus.index and cyrus.cache corruption at FastMail over the years. While most of them have been tracked back to bugs in delayed expunge or kernel bugs (yes, really), there is always the possibility of occasional corruption. Since having SHA1 integrity checks on message files we are averaging about one file per terabyte of data per year having a single block corruption somewhere in the contents. While this is low, it's good to detect!
- Checksums on individual index records, index headers, cache records and mailbox headers allow immediate detection of any corruption, and warning the admin via a syslog entry.
- These checksums will also come in handy for Low Bandwidth Replication later.
Implementation
The algorithm chosen is CRC32 - it's small enough not to be a massive overhead, widely available, and well understood. For this sort of application is provides a good set of tradeoffs.
The CRC32s are stored in the index header and individual index records. An additional XOR of all record CRC32s is stored in the index header for both additional integrity checks and as a hook for a single "are these files identical" test during replication.
Additional fields in the cyrus.index header:
- MAILBOX_HEADER_CRC - the crc32 of the mailbox header file.
- RECORD_CRC_XOR - the XOR of all record crc32s
- HEADER_CRC - the crc32 of the rest of the bytes in the raw header. HEADER_CRC must always be the last item in the buffer, and the calculation is the bytes before it, see below:
mailbox->header_crc = ntohl(*((bit32 *)(mailbox->index_base+OFFSET_HEADER_CRC)));
crc = crc32_buf(mailbox->index_base, OFFSET_HEADER_CRC);
if (crc != mailbox->header_crc) {
/* complain */
Additional fields in the cyrus.index records:
- CACHE_CRC - the crc32 of all the bytes in the entire cache record pointed to by CACHE_OFFSET.
- RECORD_CRC - the last field of the record, as with HEADER_CRC, it's the CRC32 of all the bytes before it.
When updating any record therefore, the full process is:
- mailbox->record_crc_xor ^= record->record_crc
- make the changes to fields in the record
- record->modseq = mailbox->highestmodseq + 1
- mailbox_write_index_record(mailbox, msgno, &record);
- mailbox->record_crc_xor ^= record->record_crc /* was updated by the write_index_record */
loop to other records as required, finally update the index header:
- mailbox->highestmodseq++
- update other fields as required, answered, deleted, flagged, etc
- mailbox_write_index_header(mailbox)
mailbox_read_index_header and mailbox_read_index_record can check the CRC32s as they parse the record and make sure they match. mailbox_cacherecord_index can check the crc32 of the cache record, and return an error if needed.
Later, mailbox_cacherecord_index will be able to be given a minimum cache version required (i.e. for a search) and be able to recreate the cache record if there's any corruption or it's too old, automatically fixing cache issues!
Also, mailbox_read_index_header can check mailbox->header_base to ensure the header itself hasn't been corrupted.