Data Center

Maximizing the Availability of Your Microsoft Exchange Environment With Continuous Replication

The final step in creating a fully redundant Microsoft e-mail environment has been completed in Exchange 2007.

When Microsoft Exchange 4.0 was released in the mid-1990s, it was not a vast improvement on paper mail. Users were happy getting mail a couple of times per day and only at their desks. But today’s users are much more savvy and better connected, and so the dependence on e-mail has grown to the point where many of us carry devices that alert us the moment a new e-mail arrives.
Thus, the availability of an e-mail system has become just as important a topic as recovering it in the event of a disaster. If you don’t believe me, stop the Microsoft Exchange Information Store service and time how long it takes before the first call comes into the help desk.

Through Exchange 2003, we’ve seen the following improvements in the availability space from Microsoft:

Multiple front-end servers — the ones that provide communication with clients — can be load balanced.
Multiple e-mail routes can be created into and out of an organization, so that if one Internet service provider fails, mail is simply routed through another.
Public folders, which are the collaboration tools of Exchange, are replicated across the network.
The back-end servers, which manage the mailbox data store, can be clustered using Microsoft Cluster Services so that if one experiences a system failure, another server takes its place

Clearly, many aspects of the Microsoft Exchange system are highly available, but one very glaring aspect is not. While the services on the back-end servers can be clustered, they still point to a shared mailbox data store, which is a single point of failure. Enter Exchange 2007.

Exchange 2007 introduces two new forms of redundancy long awaited by administrators: cluster continuous replication (CCR) and local continuous replication (LCR). Both of these technologies allow the e-mail system to keep multiple copies of the data stored on separate disks (even in separate locations). CCR is for those customers who already have a clustering environment in place or need to build one, and these tend to be larger businesses. LCR does not use clustering and is targeted more for small or medium-size businesses. A summary of four different highly available solutions with Exchange 2007 is shown in table 1.

Table 1: A Comparison of Highly Available Exchange Environments

Solution	Redundancy Provided	Single Point of Failure	Clustered	Data Copied	Possible Site Resilience
Single	None	Data, Service
Clustered	Service	Data	X
LCR	Data	Service		X
CCR	Data, Service	None	X	X	X

Cluster Continuous Replication

In the Exchange 2003 clustered environment shown in figure 1, multiple Exchange servers with redundant services share the mailbox data resource. This resource can be a network drive on a file server but is commonly a logical unit number (LUN) on a storage area network (SAN). The two servers are configured in active-passive mode and are called nodes of the cluster. This means that only one of them is handling traffic and processing requests (the active node). If the active node is missing for any reason, including being down for scheduled maintenance, the passive node becomes the active node and starts processing requests. As you can see, if the shared storage is not available, the cluster fails.

Fig.1

A cluster continuous replication example in Exchange 2007 is shown in figure 2. Here, the two servers are still in an active-passive cluster formation with node 1 being active. However, node 1 stores its data in a locally attached storage device. (This could just as easily be a LUN on the SAN, but for clarity I’ve chosen two different storage media.) Node 2, the passive node of the cluster, has a copy of the database files (figure 3) and pulls the transaction logs from node 1 as they are generated and deposits them on its own storage device, which in this case is a LUN on the SAN. If any component of node 1 fails, including its storage device, the cluster automatically shifts to node 2, which has a complete replica (data and all) of the Exchange system.

Fig.2

Besides having the benefit of a ready second copy of your data, you also can back up the passive node’s data without affecting performance on the active node. After a full backup, the transaction logs are deleted on the passive node (like any other Exchange server). The transaction logs are then also deleted on the active node.

Fig.3

The cluster nodes must be on the same subnet, but that subnet can span multiple locations. Therefore, given an appropriate connection, the passive node could easily be in a remote location. This differs quite a bit from most currently deployed solutions, where the SAN’s storage might be mirrored (at the block level) to another site. With CCR, data is shipped to the other site over TCP/IP in the form of transaction log files. This may or may not be appropriate for your environment, but it is always good to have options.

Fig.4

Local Continuous Replication

Local continuous replication is a derivative of CCR for nonclustered Exchange systems, appropriate for businesses that do not want to invest in a full set of redundant servers or are willing to rebuild a server in the event of a failure. In the past, those companies also had to deal with the recovery of their Exchange data, which could extend the recovery by many hours.

Fig.5

With LCR, Exchange 2007 databases and their transaction log files are replicated to another location on the local server, as shown in figure 4. These locations can be any form of attached storage that simply looks like another drive to Exchange. If the primary database becomes corrupt or a disk fails, the Information Store service can be pointed to the second location, and service continues as normal.

Fig.6

Thus, LCR provides a quick recovery solution at the moderate expense of additional storage. As any Exchange administrator will tell you, the most difficult and time-consuming part of an Exchange recovery is the database itself; reinstalling the operating system and the application shouldn’t take more than an hour or two. LCR certainly does not replace a full backup solution, because any mistakes users make (such as deleting e-mails), are replicated to the additional storage space. Offsite backups are still recommended for a complete disaster recovery solution, but LCR can provide increased availability at a relatively low cost.

Fig.7

Fig.6

Step-by-Step LCR

OK, so you’ve got a single Exchange 2003 server, you’re sold on getting to Exchange 2007 and LCR, and you want to do it on your own. The first thing you need to understand is that Microsoft decided that Exchange 2007 would be the first application to require 64-bit processors. The good news is that many servers sold have been 64 bit for at least 18 months, but they just come with a 32-bit operating system installed. The bad news is that it is not possible to upgrade in place from a 32-bit operating system to 64 bits. Once you have one that’s 64 bits up and running, you can easily move your mailboxes between the 32-bit server and the 64-bit one, then decommission the 32-bit server and reuse it somewhere else. (Technically, Exchange 2007 will run in 32-bit mode for testing, but Microsoft won’t support it, and I don’t recommend it in production.)
So let’s assume you have your Exchange 2007 server up and ready to rock. How do you take advantage of the LCR feature? Here’s how in seven easy steps:

Fig.8

Open the Exchange Management Console.
Select the Storage Group for which you want LCS, as shown in figure 5. (Because you mirror both the database and the log files, and the log files are per Storage Group, you cannot select LCS for just a single database.)
Right-click on the Storage Group and select Enable Local Continuous Replication. A wizard will open, announcing its intentions (figure 6). Click Next.
Enter the replication system files and log files paths. For simplicity, I’ve chosen a spot on the C: drive, but obviously you would not want this location to be on the same set of drives as the original copy (figure 7). Click Next.
For each mailbox database in the storage group, a new dialog box will appear where you indicate the replication database file paths. A best practice would be to have them on a separate disk from the log files. Again, for simplicity of this example, I’ve chosen a location on the C: drive (figure 8). Click Next.
The wizard now displays a summary of the tasks it is about to undertake. This is your last chance to turn back. It is also helpful (as the page indicates) to copy the contents of the page for future documentation (figure 9). Click Next.
A progress bar will be displayed at each task as the wizard completes them. If there are any errors, they are shown on this page (figure 10). Upon completion, click Finish. Congratulations — you’ve just made your Exchange database redundant.