The Question if it is possible to run Cyrus as a high availability and or load balancing cluster was often discussed on the
CyrusMailingLists.
The goal of a load balancing cluster is to distribute the usage of resources (memory, CPU time and I/O traffic) on several servers and to scale by adding further servers. The goal of a high availability cluster is to have no downtime of the service if one system fails or has to go down for maintenance.
The following setups for Cyrus in a HA and/or LB Cluster have been discussed.
- DNS / Perdition load balancing: The users are split on several Cyrus server and DNS or Perdition is used to direct the user to the correct server. This setup is only a static LB cluster and it is not possible to share mailboxes between users on different servers. But this setup is very easy and there are no problems because there are no shared filesystems. In case one server crashes some of the users will have no access to the mails. Mails may be lost depending on the kind of crash, the storage and the backup.
- CyrusMurder setup: In this setup the users are split on several backends that are connected via frontendservers. This setup allows shared folders between users on different backends. This is like the DNS / Perdition setup only a LB cluster, but users can be moved easy from one backend server to another. In case of crash this setup does not differ from the setup above.
- CyrusReplication: Replication can be added to the two setups above. But you need either additional servers or you have to take care to set up replication server and client on one Cyrus server/backend. Replication is only in Cyrus 2.3.x and this setup needs manual intervention in case of failure. In this case the MUPDATE master / DNS / Perdition has to be set to point to the new server. With scripts it should be possible to do this automatically. The replication is asynchrony so you might lose some mails.
- SAN or Shared filesystem (NAS) Active/Passive: This setup can add HA to a normal Cyrus server by storing the Mails and databases on remote disk and monitoring the server with heartbeat. Cyrus depends on the file-locking of the filesystem, SAN systems should have no problems but shared filesystems may have problems.
- Shared filesystem (NAS) Active/Active: In this setup all users are on all server. The mailboxes and databases are on a shared filesystem so that changes on one server are visible on all other servers. Cyrus depends on the file-locking of the filesystem. NFSv4, GFS, Lustre, and some other shared filesystem affirm that they have file-locking across cluster nodes. The sockets, lock- and pid-files have to stay on a local filesystem or have to be made unique across cluster nodes. BDB seems to have problems in this setup, because the changes of the lock in the mmaped files are not instantly on all clients/nodes and the use of shared memory. You have to compile Cyrus without bdb-support to get rid of the errors. This setup needs no extra servers, and all use the same configuration. In case of server crash the user use one of the other servers. The server can be replaced with a clone of one of the other servers. The storage should be too on a HA cluster to make the whole mailsystem HA. The Active/Active Shared filesystem setup is discussed controversial. This setup is not widely used and therefore not tested that much. But Dave McMurtrie and Scott Adkins have reported of successful installations
The following shared filesystems seem support the file-locking
Detailed information can found in the
CyrusMailingList? achieves in the following threads
- Cyrus IMAP and MySQL mailboxes (Building load-balancing cluster)
- NFSv4, anyone?
- Cyrus, clusters, GFS - HA yet again
- High availability email server.
- Cyrus & Lustre
Some older threads might be of interest but I think they are not up to date anymore:
- Cyrus 2.3 on shared filesystems
- Playing with replicated murder
- Using a SAN/GPFS with cyrus
- high-availability again
- Two Cyrus servers
- Cyrus, NFS and mail spools
- multiple cyruses via SAN
- Cyrus/NFS/SMB
- No NFS? Ok, how about GFS/GPFS
--
MichaelMenge - 17 May 2007
Topic revision: r7 - 07 Jan 2008 - 04:08:24 -
BryanHill