For almost my entire career I have been an application developer. I got into SharePoint roughly six years ago and continued that philosophy however a few years ago I found out I could no longer get away with that. I had to start learning the architecture of SharePoint and how to design / configure it to support the major “ilities”. The one on everyone mind was how to scale SharePoint. I like to think I became very knowledgeable of this topic for SharePoint 2007 and now SharePoint 2010. But again, I was lacking in an area – SQL Server.
I actually like to again think I am very knowledgeable on SQL Server from an application development perspective. I am strong at normalizing databases, views, creating indexes, stored procedures, optimistic/pessimistic locking, partitioning data, creating data access layers (DALs), object relational mapping (ORM), etc. but when it comes to the administration of SQL I needed to brush up on the topic. As you can see my focus has been all about getting data in and out.
Understanding SQL Server architectures for high availability are critical to the overall SharePoint architecture because of the strong relationship between the two. I have usually been able to get away with a client already having a highly available SQL Server environment ready for me but now I am finding out that is not always the case. The purpose of this blog posting is to provide information on:
- What is SQL Server High Availability?
- How does SQL Server High Availability relate to SharePoint?
I am not going to give the definition of all the SQL Server high availability solutions. This section will mostly be comprised of high recommended readings that will make you smart enough to have to have an educated discussion on SharePoint and SQL Server.
When you start reading about SQL Server High Availability, you will find the following solutions. I am going to give my very quick definitions of what they are:
- Clustering – An approach where multiple SQL Server resources (that share the same disks) are presented as a single machine to calling applications. Applications, like SharePoint, do not know how machines there are; all it knows is that there is SQL Server. If there is a failure, clustering services will bring online the other SQL Machine without affecting the application. Clustering does not provide protection against disk failure as it is a software level solution.
- Mirroring – This is an approach where a primary and a mirror database server are set up. There are two modes that you have to understand. There is high-safety mode (synchronous) which will ensure that any transaction that is executed against the primary is completed on the mirror. One drawback of running in high-safety is that it takes more time. The second is high-performance (asynchronous) which commits transactions on both the primary and mirror databases; however the primary will not wait on the mirror to complete the transaction. This provides better performance for highly transactional systems but there is the potential for data loss. So at best this solution provides a “warm” stand-by with potential data loss if asynchronous is used. There is configuration of mirroring called high-safety with automatic failover where a third server is used (called the witness) which will evaluate communication between the two machines and potentially make the mirror server the primary server. This effectively makes the mirror server a “hot” standby but the calling applications must be notified and there are solutions for this. Another limitation of database mirroring is that it can only be configured between two machines.
- Log Shipping – They usually start off by saying this solution operates that the database level. This solution allows you to create one-to-many “warm” standbys where there is a primary database and secondary databases monitor the logs of the primary and bring themselves up to date with the primary. As you can see, there is a delay between the secondary being completely up to date with the primary (which is not a bad thing in some scenarios). For a secondary database to become the primary, all the logs must be applied. This can be used as a solution to supplement mirroring.
- Replication – I joke the tried and true methodology which uses a publish-subscribe model where the primary will distribute data to one or more secondary servers. We are not going to dive into this for SharePoint.
- High Availability Solutions Overview - http://msdn.microsoft.com/en-us/library/ms190202.aspx - I highly recommend starting here. My overview aligns to this.
- Selecting a High Availability Solution - http://msdn.microsoft.com/en-us/library/bb510414.aspx - this is a great read that summaries the pro and cons of the approaches I have above. Highly recommend reading this.
- High Availability: Interoperability and Coexistence - http://msdn.microsoft.com/en-us/library/bb500117.aspx - This is a very solid series written about how the high availability solutions I discussed above can be used together. Specifically there are two articles in this section that show how Mirroring and Clustering are implemented together. Then another discussion on how Mirroring and Log Shipping work together. These enlightening to read.
- High Availability with SQL Server 2008 - http://msdn.microsoft.com/en-us/library/ee523927(SQL.100).aspx - This is a great article that says “wait” understand your requirements before you start talking high availability solutions. Plus it goes into backup and recovery which is another major piece of the puzzle. I would read this after you have read my other recommendations as this will bring the entire discussion into perspective.
- Achieve High Availability for SQL Server - http://technet.microsoft.com/en-us/magazine/2007.03.highavailability.aspx - This is an optional recommendation on the basic of availability. It is on SQL 2005 but it is well written and supplements the first two I referenced.
SharePoint and SQL Server for High Availability
So now, you may be wondering how this may translate into SharePoint 2010 and how I create my architecture. Hopefully I will be able to provide you some references on how you should configure SharePoint 2010 for high availability because on your requirements.
SharePoint 2010 Databases
First it is important to understand all of the databases of SharePoint 2010. I wrote a blog here based on this Technical Diagram - http://go.microsoft.com/fwlink/?LinkId=187969. Highly recommend reading these and understanding the nature of each SharePoint database. You will need to understand this in the context of the configuration of SharePoint you are using.
SharePoint 2010 Capacity Testing Results
Yes, Microsoft has done a significant amount of testing to understand the boundaries of SharePoint 2010 and have made the results from these tests available. I have written a summary about these test results on my blog here. At the top of this blog I provide references to detailed whitepapers published by Microsoft. These are very in-depth and I would not recommend reading this out right. However there is a ton of good information here which will point you in the right direction of how you may want to design for high availability.
SQL Server 2008 R2 and SharePoint 2010 Products: Better Together
This is a whitepaper (http://technet.microsoft.com/en-us/library/cc990273.aspx) that has been written on the topic of how SharePoint 2010 and SQL Server 2008 together make a great solution. However you will find it to be no more of an overview and will not give you information how to configure for a highly available environment. I would recommend this as supplemental only or provide this to folks who have very limited understanding of SQL Server (so this whitepaper could be useful).
Storage and SQL Server capacity planning and configuration (SharePoint Server 2010)
This whitepaper (http://technet.microsoft.com/en-us/library/cc298801.aspx) is on the completely other side of the spectrum that just referenced. I would just give this article immediately to any high end SQL administrator who is evaluating or designing a solution to support SharePoint 2010. It is full of best practices and official recommendations. There are gems of information in here like:
- Making sure I/O operations per second be the fastest it can be.
- Equations for estimating database sizing for all the types of databases.
- Direct Attached Storage (DAS), Storage Area Network (SAN), and Network Attached Storage (NAS).
- Disk types
- Memory
- Server architecture. Interesting facts I read where additional database server is recommended when there are more than four web servers running at full capacity. Plus add additional SQL Server when content databases exceed 5 terabytes.
- SQL Server configuration options
- Disk prioritization
- Performance Counters to watch.
Now this is the whitepaper (http://technet.microsoft.com/en-us/library/cc748824.aspx) that really brings the whole thing together and I highly recommend this article. I purposely brought this up LAST because if you read this far you should be rewarded J
When you start reading this you will see this has little to do with the SharePoint 2010 Service architecture and more to do with SQL Server high availability solutions. Specifically it recommends:
- Failover clustering
- Mirroring – “For mirroring within a SharePoint Server farm, you must use high-availability mirroring, also known as high-safety mode with automatic failover”. This means synchronous.
A few other interesting notes are:
- SharePoint 2010 is now mirroring aware meaning that SharePoint 2010 can be configured to know which SQL server is the primary and which is the mirror. This reduces the need for clustering.
- There is a good chart that compares clustering against mirroring.
- There is no mention of Log Shipping however you may very well want to implement it. I can see scenarios where you can do some creative things like have a log ship configuration refreshing a testing environment with production data on an interval, etc.
Plan for disaster recovery (SharePoint Server 2010)
Disaster Recovery is another piece of the puzzle when designing a highly available SharePoint 2010 architecture. Please read this whitepaper (http://technet.microsoft.com/en-us/library/ff628971.aspx). On the topic of SQL Server there is specific mention to using either Log Shipping or Asynchronous Mirroring to create a “hot” disaster recovery environment. There are specific discussions to which databases need to be part of the configuration. This is a good starting place to start understanding this topic.
Conclusions
There are a couple conclusions I came to while performing this research:
- The solutions for scaling SQL Server itself are not really all that much different in concept to the solutions we have at our hands for SharePoint 2010 services.
- Much of what I talked about in this blog applies to both SharePoint 2007 and 2010.
- I have always understood there has been a direct correlation between SharePoint and SQL Server when it comes to performance. I have seen when SharePoint flat out perform better in these situations.
- High Availability does not equate to better performance in all circumstances. Typically SQL machines that are in a high available environment will be more “beefy”. Point is if you expect good performance and you put SharePoint on an already highly utilized SQL server there will be performance issues. In those situations you may want to have dedicated SQL Servers for SharePoint. I wish I could still give a one size fits all recommendation but it really comes down to how your environment is configured and how SharePoint is ultimately used.
- Back Up and Recovery are not addressed as part of this discussion. That is a completely separate topic but is absolutely critical to the overall high availability architecture.