Wednesday, November 16, 2011

Sharepoint 2010 Search Architecture

Introduction
Every time there is a major version release of SharePoint there is always a major improvement in the area of Search. Users need to have the ability to search for content and the users demand the same sort of user experience they have with their everyday search engine. Microsoft really stepped up to the plate this time with SharePoint 2010.
SharePoint 2007 introduced tons of new features that were not available in SharePoint 2003. For instance searches could be done across site collections, it actually returned back correct results, scopes, best bets, search analytics, search federation, business data catalog (searching external line of business systems), a search API, etc. Still however there were some challenges. Scale would become an issue because SharePoint 2007 had exponential growth due to its ease of use. The Share Service Provide (SSP) and the way it is architected was a contributing factor. For instance an SSP could only have on crawler which provided no ability to control large amounts of content. As well, users were demanding for the same sort of user experience they have with Bing, Google, etc.
For SharePoint 2010, there is a lot of improvements.
  • There is the new Service Architecture of SharePoint 2010 and the removal of the SSP. Read this blog. As you will see this now enable SharePoint 2010 Search to scale and that will be the focus of this blog.
  • The ability to index 100 million items.
  • Continued support for indexing file shares, external web sites, line of business systems, public exchange folders, etc.
  • Boolean search (and, or, not) are supported.
  • Range symbols such as =, <, >, <=, and >= can be used.
  • Wildcard searches are now supported out of the box.
  • Support for property based searches on the metadata (title:“XXX YYY”).
  • Improved relevancy mode like Phrase matching and clickthrough counts.
  • Refiners which provide the ability filter down the search results using the returned metadata without have to re-run the actual search.
  • Did you mean feature which provides suggestions – like in the case the user misspells a word.
  • Search suggestions which provides an auto complete based on what the user commonly searches on.
  • Search Alerts and RSS Feeds
  • Improved query federation
  • More extensible search web parts How
  • Several new administration features
  • Mobile search 

Another major Search improvement is Microsoft’s acquisition of FAST Enterprise Search. Microsoft spent over $2 billion to acquire one of the most high enterprise search engines on the market and incorporate it into the SharePoint platform. The goal of this blog is not to do a feature and architecture comparison with FAST. I may do one in the near future. At a high level you should know that FAST has:
  • Limitless scalability. Can search and return results in sub-second times over petabytes of data.
  • High scale search refiners.
  • Extremely powerful and tunable search relevancy model.
  • User contextual search results and relevancy.
  • Entity extraction.
  • Ability to index almost any type of content imaginable.
  • Similar search result suggestions.
  • Thumbnails and document previewing in search results.
  • Visual best bets.
Still getting your arms around the OOB of the box search if you are familiar with SharePoint 2007 search can be daunting task. I was able to pull together a bunch of information and I am going to consolidate this down for you.
  1. I will capture the new components for SharePoint 2010 Search.
  2. I will then discuss how each component can be scaled.
  3. I will then discuss scenarios on SharePoint 2010 Search is scaled.
  4. I will actually show you how simple it is do the scaling of SharePoint 2010 Search
SharePoint 2010 Search Components
Let’s first talk about all the new components and architecture you need to know about right off the bat.
Crawler – You will hear about this a lot; it is commonly referred to as the crawling component or indexer. It is responsible for building indexes. Unlike the previous version of SharePoint the crawl component is stateless; meaning the index that is created is not actually stored in the crawl component. The index is pushed to the appropriate query server.
Crawl Database – As you just learned, the crawling component itself is stateless. State is actually managed in the crawl database which will track what needs to be crawled and what has been crawled.
Query Component – This is the component that will perform a search against an index created by the crawler component. The query component will apply such things as security trimming, best bets, relevancy, removes duplicates, etc. It is also commonly referred to as the query server.
Index Partition – Is a new feature of SharePoint 2010 and is directly correlated to the query component. We now have the ability to break the indexes into multiple partitions to improve the amount of time it takes to perform a search by the query component. For every query component there will be a single index partition that is queried by the query component. Another way of putting it is, every time a query component is created, another index partition is created.
Index Partition Mirror – There is a new capability to create mirrors of the index partitions. These mirrors again provide the ability to provide redundancy and better search result performance.
Property Database – Stores metadata and security information items in the index. The property database will be associated to one or more query components and is user as part of the query process. These properties will be populated as part of the crawling process which creates the index.
Search Admin Component – The admin component that manages the configuration of the instance of the Search Application Service.
Search Admin Database – It is worth noting there is a search administration database and it mostly responsible for managing information associated to the configuration and topology of the SharePoint Search service. There will only ever be one instance of this database for each Search Application Service instance.
Now that I have introduced you to the major components of SharePoint 2010 Search, I will dive into details about the components and how they can be used together to create search solutions.