|
|
| Computing Services Home |
| ASG Home |
| What's New |
| Cyrus Wiki |
| Employment |
| Contact Us |
A001 CREATE INBOX.new
A002 SELECT INBOX.new
Also imagine such a sequence of commands happening in two different connections.
internal consistency versus external consistency. internal consistency ensures the database is not corrupted. this is easy to verify for flat text files, harder for B-Tree databases. external consistency ensures that the database accurately reflects information on the spool disks or on the backend servers.
what sort of database? flat text: simple, currently works, and gives "always internally consistent". plus, readers can operate on older versions while a writer updates the master version and swaps it in.
one writer/many readers berkeley db: offers better performance for lookups, but may actually hurt performance since the writer needs to guarantee that no readers are present before it starts writing. if we crash, there's no guarantees on internal consistency.
transactional berkeley db: incurs heavy locking and logging overheads, but possibly offers the best performance. allows multiple readers and multiple writers, allows lock upgrading. however, applications need to deal with deadlock and crashing can be disasterous. ensures internal consistency.
what's true? the theoretical list of mailboxes we're trying to distribute is the mailboxes actually exist in a spool disk on all of the backend servers. As we get farther away from that, the consistency of the data will degrade. What sort of degradation can we afford? Is the game to let the data get as inconsistent as possible while maintaining client operation? Keep the data as consistent as possible while maintaining acceptable performance?
The proxyd handle the IMAP session with clients. It relies on a consistent and complete mailboxes database that reflects the state of the world. It never writes to the mailboxes database.
What happens if the mailboxes database on the frontend is out of date? Can proxyd act proactively? When a client SELECTs a mailbox that doesn't appear in the database, proxyd sends a signal to murder-front to make sure all updates have occured.
The murder-front process (one per frontend) holds open an ACAP context and listens for updates from the ACAP server (that come from the backend servers making modifications on the mailbox dataset) and makes these modifications on the local copy of the mailboxes database. Should the murder-front process totally rewrite the mailboxes database when it starts up?
All mailbox operations get forwarded to the appropriate backend server, so the only one that's tricky is CREATE. To CREATE foo.bar (all danger of inconsistency rests in the hands of the backend server):
To SELECT foo.bar:
This makes SELECTs on mailboxes that don't exist much more expensive. I don't think this is a problem.
To RENAME foo.bar aaa.bbb: Do we allow cross-server renames?
Each backend server maintains a local mailboxes database, listing what mailboxes are available on that server.
The imapd processes on the backend server stand by themselves, so that each backend IMAP server can be used in isolation without an ACAP server or any frontend servers. However, they may be configured so that they won't process any mailbox operations unless the master ACAP server can be contacted (allows for namespace consistency).
The imapd processes update the local mailboxes database themselves. However, on a CREATE they need to reserve a place with the ACAP server before proceeding with the creation. Thus a flag in the mailboxes dataset needs to be reserved for "in progress".
To CREATE foo.bar:
Failure modes: Above, all backend inconsistencies result in the next CREATE attempt failing. The earlier ACAP inconsistency results in any attempts to CREATE the mailbox on another backend failing. The latter one makes the mailbox unreachable and uncreatable.
To RENAME foo.bar aaa.bbb:
urg. sieve sure does complicate things. here are some proposals.
early sieve: The sooner that mail is run through Sieve scripts, the better. We want to reduce the total number of hops mail takes before it lands in the appropriate mailbox. This can be translates into the MX (remote submission) and SMTP (local submission) servers running the Sieve scripts themselves. This requires any submission host to contact the ACAP servers for the user's Sieve script, and for the mailboxes dataset to determine where to route the mail (via LMTP) when a Sieve script calls "fileinto" or "keep".
frontend sieve: Since the frontend servers already keep a copy of the global mailboxes database, they can easily process Sieve scripts efficiently. They still need to use LMTP to transfer messages to the final backend destination.
backend sieve: Since different backend servers are unaware of each other, running Sieve scripts on the backend has several disadvantages. Messages have to be routed to the backend server that holds the user's INBOX, and then the Sieve processing happens. Any fileinto actions that refer to non-local mailboxes fail. This breaks backend server transparency.