Monday 9 September 2019

Unknown Shape Definition

When running Sitecore 9.1.1 / Solr 7.1.0, we started noticing that the counts of items in an index changed every time we re-indexed the site.

Looking in the logs I saw Solr errors relating to invalid coordinate data:

Exception: SolrNet.Exceptions.SolrConnectionException
Message: <?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">400</int>
  <int name="QTime">545</int>
</lst>
<lst name="error">
  <lst name="metadata">
    <str name="error-class">org.apache.solr.common.SolrException</str>
    <str name="root-error-class">java.text.ParseException</str>
  </lst>
  <str name="msg">ERROR: [doc=sitecore://master/{a885844b-34cd-4eb1-a9d0-2ab1bcc8587a}?lang=en&amp;ver=1&amp;ndx=sitecore_master_index] Error adding field 'coordinate_rpt'='-121.444851,37.712654' msg=Unable to parse shape given formats "lat,lon", "x y" or as WKT because java.text.ParseException: Unknown Shape definition [-121.444851,37.712654]</str>
  <int name="code">400</int>
</lst>
</response>

Source: SolrNet
   at SolrNet.Impl.SolrConnection.PostStream(String relativeUrl, String contentType, Stream content, IEnumerable`1 parameters)
   at SolrNet.Impl.SolrConnection.Post(String relativeUrl, String s)
   at SolrNet.Impl.LowLevelSolrServer.SendAndParseHeader(ISolrCommand cmd)
   at Sitecore.ContentSearch.SolrProvider.SolrBatchUpdateContext.AddRange(IEnumerable`1 group, Int32 groupSize)
   at Sitecore.ContentSearch.SolrProvider.SolrBatchUpdateContext.AddDocument(Object itemToAdd, IExecutionContext[] executionContexts)
   at Sitecore.ContentSearch.SolrProvider.SolrIndexOperations.ApplyPermissionsThenIndex(IProviderUpdateContext context, IIndexable version)
   at Sitecore.ContentSearch.SitecoreItemCrawler.DoAdd(IProviderUpdateContext context, SitecoreIndexableItem indexable)
   at Sitecore.ContentSearch.HierarchicalDataCrawler`1.CrawlItem(T indexable, IProviderUpdateContext context, CrawlState`1 state)

Nested Exception

Exception: System.Net.WebException
Message: The remote server returned an error: (400) Bad Request.
Source: System
   at System.Net.HttpWebRequest.GetResponse()
   at HttpWebAdapters.Adapters.HttpWebRequestAdapter.GetResponse()
   at SolrNet.Impl.SolrConnection.GetResponse(IHttpWebRequest request)
   at SolrNet.Impl.SolrConnection.PostStream(String relativeUrl, String contentType, Stream content, IEnumerable`1 parameters)



It looks like when Solr threw the error, any other items in the same batch were ignored. 

After some investigations, I realized that Sitecore is using a calculated field to index the coordinate data, and so overriding it with additional validation should be be fairly straight forwards.

public class CoordinateValidationComputedIndex : AbstractComputedIndexField
    {
        private static readonly ILog Logger = LogManager.GetLogger("Sitecore.Diagnostics.Crawling") ?? LoggerFactory.GetLogger(typeof(CrawlingLog));
        public override object ComputeFieldValue(IIndexable indexable)
        {
            Item obj = indexable as SitecoreIndexableItem;
            if (obj == null || !obj.Fields.Contains(new ID(Constants.Latitude)) ||
                !obj.Fields.Contains(new ID(Constants.Longitude)))
            {
                return null;
            }

            if (!double.TryParse(obj[new ID(Constants.Latitude)], NumberStyles.Any, CultureInfo.InvariantCulture,
                    out var lat) || !double.TryParse(obj[new ID(Constants.Longitude)], NumberStyles.Any,
                    CultureInfo.InvariantCulture, out var lon))
            {
                return null;
            }

            //Latitude Check -90 - +90
            //Longitude Check -180 - +180
            if (lat < -90 || lat > 90 || lon < -180 || lon > 180)
            {
                Logger.Warn(
                    $"Coordinate validation failed for {obj.ID.Guid:B}:{obj.Paths.FullPath}\n\rWith value of latitude: {lat}, longitude: {lon}\n\r   Latitude should be in range -90 to +90\n\r   Longitude should be in range -180 to +180");
                return null;
            }

            return new Coordinate(lat, lon).ToString();
        }
    }

I then created the following patch file to override the OTB configuration.

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:env="http://www.sitecore.net/xmlconfig/env/">
  <sitecore>
    <contentSearch>
      <indexConfigurations>
        <defaultSolrIndexConfiguration type="Sitecore.ContentSearch.SolrProvider.SolrIndexConfiguration, Sitecore.ContentSearch.SolrProvider">
          <documentOptions type="Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider">
            <fields hint="raw:AddComputedIndexField">
              <field patch:instead="*[@fieldName='coordinate']" fieldName="coordinate" returnType="coordinate" >zzz.Feature.Geolocation.ComputedIndex.CoordinateValidationComputedIndex, zzz.Feature</field>
            </fields>
          </documentOptions>
        </defaultSolrIndexConfiguration>
      </indexConfigurations>
    </contentSearch>
  </sitecore>
</configuration>

And this is now what I get in the logs (no Solr errors), and the correct number of items being indexed.

4176 16:17:47 INFO  [Index=sitecore_master_index] Crawler: Processed 5000 items
7988 16:17:47 WARN  Coordinate validation failed for {570326e8-7ae6-4f7a-9354-62670d99c199} : ***Item Path Removed*** with value of latitude:-95.67611, longitude:-95.67611 - 
   Latitude should be in range -90 to +90
   Longitude should be in range -180 to +180
6564 16:17:51 INFO  [Index=sitecore_master_index] Crawler: Processed 6000 items