all things Sitecore related

How to configure your Sitecore content search indexes

By on June 25, 2015 in Sitecore with 5 Comments

It’s common on a sitecore implementation to leave the content search index configuration as out-of-the box or just make modifications to the storagetype of a specific field. However you could benefit from configuring your indexes further down.

  • Full index rebuild time will be less
  • Your Sitecore instance will also be less charged with indexing tasks
  • Reduces the size of your index

What is to be found where?

To give you a bit more context how indexes are defined and how these definitions are linked to a specific index configuration we will have a look at the following files in the sitecore include folder (/App_Config/Include).

  • Sitecore.ContentSearch.Lucene.Index.[Indexname].config (index definitions)
  • Sitecore.ContentSearch.Lucene.DefaultConfigurations.config (Index configuration)

Looking at Sitecore.ContentSearch.Lucene.Index.Master.config you will see something similar. Please note the following nodes:

  • Configuration: this will keep a reference to a location in our merged web.config which stores our index configuration (so the what and how)
  • Locations: In locations we can specify which locations of our content will be indexed, additional crawlers can be added if necessary

<index id=”sitecore_master_index” type=”Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider”>

<param desc=”name”>$(id)</param>

<param desc=”folder”>$(id)</param>

<!– This initializes index property store. Id has to be set to the index id –>

<param desc=”propertyStore” ref=”contentSearch/indexConfigurations/databasePropertyStore” param1=”$(id)” />

<configuration ref=”contentSearch/indexConfigurations/defaultLuceneIndexConfiguration” />

<strategies hint=”list:AddStrategy”>

<!– NOTE: order of these is controls the execution order –>

<strategy ref=”contentSearch/indexConfigurations/indexUpdateStrategies/syncMaster” />

</strategies>

<commitPolicyExecutor type=”Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch”>

<policies hint=”list:AddCommitPolicy”>

<policy type=”Sitecore.ContentSearch.TimeIntervalCommitPolicy, Sitecore.ContentSearch” />

</policies>

</commitPolicyExecutor>

<locations hint=”list:AddCrawler”>

<crawler type=”Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch”>

<Database>master</Database>

<Root>/sitecore</Root>

</crawler>

</locations>

</index>

Looking at Sitecore.ContentSearch.Lucene.DefaultConfigurations.config  you will see how items are indexed. By default any out-of-the box contentsearch index will point to this configuration.

Please note the following nodes we will focusing on in this blog post

  • indexAllFields: Basically this is the master setting, index every field by default or not. The default setting is true
  • fieldNames hint=”raw:AddfieldByFieldName”: This allow you to specifiy how fields will be indexed based on fieldname
  • fieldTypes hint=”raw:AddFieldByFieldTypeName”: This allow you to specifly how fields will be indexed based on fieldtype
  • Exclude hint=”list:ExcludeTemplate”: this allow you to exclude a specific template to be indexed, use this if you used true as a value for the indexAllfields setting.
  • Exclude hint=”list:ExcludeField”: this allow you to exclude a specific field from your index, use this if you used true as a value for the indexAllfields setting.
  • Include hint:”list:IncludeTempalte”: this allow you to include a specific template to be indexed, use this if you used false as a value for the indexAllfields setting.
  • Include hin:”list:IncludeField”: this allow you to include a specific field to be indexed, use this if you used false as a value for the indexAllfields setting.

<defaultLuceneIndexConfiguration type=”Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider”>

<indexAllFields>true</indexAllFields>

<fieldMap type=”Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch”>

<fieldNames hint=”raw:AddFieldByFieldName”>

<field fieldName=”title”                storageType=”NO”  indexType=”TOKENIZED”    vectorType=”NO” boost=”1f” type=”System.String” settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider” />

<field fieldName=”text”                 storageType=”NO”  indexType=”TOKENIZED”    vectorType=”NO” boost=”1f” type=”System.String” settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider” />

</fieldNames>

<fieldTypes hint=”raw:AddFieldByFieldTypeName”>

<fieldType fieldTypeName=”attachment”                         storageType=”NO” indexType=”TOKENIZED” vectorType=”NO” boost=”1f” type=”System.String”   settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider” />

<fieldType fieldTypeName=”checkbox”                           storageType=”NO” indexType=”TOKENIZED” vectorType=”NO” boost=”1f” type=”System.String”   settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider” />

<fieldType fieldTypeName=”checklist”                          storageType=”NO” indexType=”TOKENIZED” vectorType=”NO” boost=”1f” type=”System.String”   settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider” />

</fieldTypes>

</fieldMap>

<exclude hint=”list:ExcludeTemplate”>

<BucketFolderTemplateId>{ADB6CA4F-03EF-4F47-B9AC-9CE2BA53FF97}</BucketFolderTemplateId>

</exclude>

<include hint=”list:IncludeTemplate”>

<BucketFolderTemplateId>{ADB6CA4F-03EF-4F47-B9AC-9CE2BA53FF97}</BucketFolderTemplateId>

</include>

<include hint=”list:IncludeField”>

<fieldId>{8CDC337E-A112-42FB-BBB4-4143751E123F}</fieldId>

</include>

<exclude hint=”list:ExcludeField”>

<__display_name>{B5E02AD9-D56F-4C41-A065-A133DB87BDEB}</__display_name>

<__Base_template>{12C33F3F-86C5-43A5-AEB4-5598CEC45116}</__Base_template>

</exclude>

</defaultLuceneIndexConfiguration>

Tuning your indexes

Minimal Index configuration

The absolute minimal index configuration is the following, please note the indexAllFields setting is set to false.

<mySearchConfiguration type=”Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider”>

<indexAllFields>false</indexAllFields>

<initializeOnAdd>true</initializeOnAdd>

<analyzer ref=”contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/analyzer” />

<fieldMap type=”Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch”>

<fieldNames hint=”raw:AddFieldByFieldName”>

<field fieldName=”_uniqueid”            storageType=”YES” indexType=”TOKENIZED”    vectorType=”NO” boost=”1f” type=”System.String” settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider”>

<analyzer type=”Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider” />

</field>

</fieldNames>

</fieldMap>

<fieldReaders ref=”contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldReaders”/>

<indexFieldStorageValueFormatter ref=”contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexFieldStorageValueFormatter”/>

<indexDocumentPropertyMapper ref=”contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexDocumentPropertyMapper”/>

</mySearchConfiguration>

The following fields will be added by default by Sitecore regardless of your config:

  • _content
  • _created
  • _creator
  • _database
  • _datasource
  • _displayname
  • _editor
  • _fullpath
  • _group
  • _indexname
  • _language
  • _latestversion
  • _name
  • _parent
  • _path
  • _template
  • _templatename
  • _uniqueid
  • _updated
  • _version

Web Index

The best possible way to configure your Web index is to strip it down till we have absolute minimum configuration.

As a first step we start by defining what needs to be searchable from our site’s frontend.

Let’s say I have a template called Page with a Title, Summary, Text, Show in menu and some meta information Fields.  From the front-end I can search for pages on my website and in my resultset I want to make Title and Summary visible.

Screen Shot 06-23-15 at 04.09 PM

 

Knowing this information I can now create my index configuration include file based on the minimal index configuration. I will call it Sitecore.ContentSearch.Lucene.WebIndexConfiguration.config with the following content.

<configuration xmlns:patch=”http://www.sitecore.net/xmlconfig/”>

<sitecore>

<contentSearch>

<indexConfigurations>

<WebSearchConfiguration type=”Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider”>

<indexAllFields>false</indexAllFields>

<initializeOnAdd>true</initializeOnAdd>

<analyzer ref=”contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/analyzer” />

<fieldMap type=”Sitecore.ContentSearch.FieldMap, Sitecore.ContentSearch”>

<fieldNames hint=”raw:AddFieldByFieldName”>

<field fieldName=”_uniqueid” storageType=”YES” indexType=”TOKENIZED”    vectorType=”NO” boost=”1f” type=”System.String” settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider”>

<analyzer type=”Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider” />

</field>

<field fieldName=”Title” storageType=”YES”  indexType=”TOKENIZED”   vectorType=”NO” boost=”1f” type=”System.String”   settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider”>

<analyzer type=”Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider” />

</field>

<field fieldName=”Summary” storageType=”YES”  indexType=”TOKENIZED”   vectorType=”NO” boost=”1f” type=”System.String”   settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider”>

<analyzer type=”Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider” />

</field>

<field fieldName=”Text” storageType=”NO”  indexType=”TOKENIZED”   vectorType=”NO” boost=”1f” type=”System.String”   settingType=”Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider”>

<analyzer type=”Sitecore.ContentSearch.LuceneProvider.Analyzers.LowerCaseKeywordAnalyzer, Sitecore.ContentSearch.LuceneProvider” />

</field>

</fieldNames>

</fieldMap>

<include hint=”list:IncludeTemplate”>

<Page>{6528BC8A-C708-4D93-991D-DAF99B9B22E0}</Page>

</include>

<include hint=”list:IncludeField”>

<fieldId>{2C78011B-A163-40D5-9F31-E7AFC5F442F8}</fieldId> <!– Title –>

<fieldId>{B81D0BD1-6181-41E9-BC34-844848D357E0}</fieldId> <!– Summary –>

<fieldId>{731F96F3-3200-43A2-8FA9-7CCA86C50FCD}</fieldId> <!– Text –>

</include>

<fieldReaders ref=”contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldReaders”/>

<indexFieldStorageValueFormatter ref=”contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexFieldStorageValueFormatter”/>

<indexDocumentPropertyMapper ref=”contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/indexDocumentPropertyMapper”/>

</WebSearchConfiguration>

</indexConfigurations>

</contentSearch>

</sitecore>

</configuration>

The only modifications I’ve done to the minimal index configuration is

  • Included my page template since this template needs to be searched on, removing the include template node will result in any template being indexed.
  • Included my Title, Summary and Text fields since these fields need to be searched on
  • Specified how the Title, Summary and Text are to be indexed.
    • For the Title and Summary field storage Type has been set to true. This means the actual field value is also stored into the index so we can retrieve it to display our resultset without the need to look up the item in the Sitecore ContentTree.
    • The StorageType for the Text field is set to False, so the field is still searchable but Sitecore is not storing the field value in the index, this to reduce the size of our index.

 

Now we need to make our index definition aware of these settings, this can be done by modifying our index definition include file as following (or do it the proper way and create an include file which overrides the necessary values 😉 ). Please note I have also restricted the crawler to include only items under /sitecore/content/home which is my site’s root note.

<configuration xmlns:patch=”http://www.sitecore.net/xmlconfig/”>

<sitecore>

<contentSearch>

<configuration type=”Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch”>

<indexes hint=”list:AddIndex”>

<index id=”sitecore_web_index” type=”Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider”>

<param desc=”name”>$(id)</param>

<param desc=”folder”>$(id)</param>

<!– This initializes index property store. Id has to be set to the index id –>

<param desc=”propertyStore” ref=”contentSearch/indexConfigurations/databasePropertyStore” param1=”$(id)” />

<configuration ref=”contentSearch/indexConfigurations/WebSearchConfiguration” /> <!– Points to my custom SearchConfiguration node –>

<strategies hint=”list:AddStrategy”>

<!– NOTE: order of these is controls the execution order –>

<strategy ref=”contentSearch/indexConfigurations/indexUpdateStrategies/onPublishEndAsync” />

</strategies>

<commitPolicyExecutor type=”Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch”>

<policies hint=”list:AddCommitPolicy”>

<policy type=”Sitecore.ContentSearch.TimeIntervalCommitPolicy, Sitecore.ContentSearch” />

</policies>

</commitPolicyExecutor>

<locations hint=”list:AddCrawler”>

<crawler type=”Sitecore.ContentSearch.SitecoreItemCrawler, Sitecore.ContentSearch”>

<Database>web</Database>

<Root>/sitecore/content/home</Root>

</crawler>

</locations>

</index>

</indexes>

</configuration>

</contentSearch>

</sitecore>

</configuration>

As a result my index on my local machine got reduced from 96 MB to 63 MB using this optimization (10000 items). This difference seems to be minimal, this is because my content tree didn’t provide too much overhead, on larger content trees with a bigger variation of item types this effect will be definitely bigger. Real world examples exists where the index got reduced from +/- 1GB to 100-200 MB.

Master Index

For the master index you are a bit more restricted in stripping down your search index since want to break the backend search functionality which will be used by content authors.

The strategy to use in this case is to exclude any field and/or template which you don’t want to make searchable. This can be done using the Exlcude nodes in your index configuration. Unless you don’t care about backend search, then go for the Web Index solution.

<exclude hint=”list:ExcludeTemplate”>

<BucketFolderTemplateId>{ADB6CA4F-03EF-4F47-B9AC-9CE2BA53FF97}</BucketFolderTemplateId>

</exclude>

<exclude hint=”list:ExcludeField”>

<__display_name>{B5E02AD9-D56F-4C41-A065-A133DB87BDEB}</__display_name>

<__Base_template>{12C33F3F-86C5-43A5-AEB4-5598CEC45116}</__Base_template>

</exclude>

More indexes

Sometimes you could as well benefit from merging parts of your content tree into a separate index. For instance a product repository which get imported regularly from an external system. The way to do this is.

  1. Create an additional index definition, examples can be found in Sitecore.ContentSearch.Lucene.Index.[Indexname].config
    1. Please note a correct indexing strategy should be used (more info can be found on http://www.sitecore.net/learn/blogs/technical-blogs/john-west-sitecore-blog/posts/2013/04/sitecore-7-index-update-strategies.aspx or in the Sitecore documentation)
    2. Create an additional index configuration which fits your needs like we did for the web index
    3. Make sure the index definition with the most specific location in your content tree appears first in our merged web.config.
      Eg.: I have a sitecore_webproducts_index which indexes everything below /sitecore/content/repository/products and I have a sitecore_web_index which indexes everything below /sitecore/content/ on the same database. At this point sitecore_webproducts_index needs to be defined before sitecore_web_index in the merged web.config to allow proper index switching by Sitecore.

 

References

http://www.mikkelhm.dk/blog/defining-a-custom-index-in-sitecore-7-the-absolute-minimum

http://www.sitecore.net/learn/blogs/technical-blogs/john-west-sitecore-blog/posts/2013/04/sitecore-7-index-update-strategies.aspx

About the Author

About the Author: .

Subscribe

If you enjoyed this article, subscribe now to receive more just like it.

There Are 5 Brilliant Comments

Trackback URL | Comments RSS Feed

Sites That Link to this Post

  1. A Search Primer | The Runtime Report | October 26, 2016
  2. A Sitecore Search Primer | The Runtime Report | October 26, 2016
  1. Mark Stiles says:

    Nice work. Trying to figure out how to minimize the index config file has to be one of the first things you hit when you see the other examples. They’re absolutely mammoth. It’s really helpful to have a reference to go back to.

  2. Jim Schram says:

    #3 under More Indexes has been a life saver. We’ve been fighting some inbound filtering problems across multiple indexes. Although we are using Coveo and not Lucene, the re-ordering of the indexes in the config file from most specific crawler root location to most general has solved a major problem we were experiencing. Thanks!

  3. Afshin says:

    Sorry I didn’t get.
    you first created Sitecore.ContentSearch.Lucene.WebIndexConfiguration.config and included field names that you want to be indexed, and finally created new file to restrict it to home node.

    What is the name of second config file ?!

Post a Comment

Your email address will not be published. Required fields are marked *

Top