Looking in the logs I saw Solr errors relating to invalid coordinate data:
Exception: SolrNet.Exceptions.SolrConnectionException
Message: <?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">400</int>
<int name="QTime">545</int>
</lst>
<lst name="error">
<lst name="metadata">
<str name="error-class">org.apache.solr.common.SolrException</str>
<str name="root-error-class">java.text.ParseException</str>
</lst>
<str name="msg">ERROR: [doc=sitecore://master/{a885844b-34cd-4eb1-a9d0-2ab1bcc8587a}?lang=en&ver=1&ndx=sitecore_master_index] Error adding field 'coordinate_rpt'='-121.444851,37.712654' msg=Unable to parse shape given formats "lat,lon", "x y" or as WKT because java.text.ParseException: Unknown Shape definition [-121.444851,37.712654]</str>
<int name="code">400</int>
</lst>
</response>
Source: SolrNet
at SolrNet.Impl.SolrConnection.PostStream(String relativeUrl, String contentType, Stream content, IEnumerable`1 parameters)
at SolrNet.Impl.SolrConnection.Post(String relativeUrl, String s)
at SolrNet.Impl.LowLevelSolrServer.SendAndParseHeader(ISolrCommand cmd)
at Sitecore.ContentSearch.SolrProvider.SolrBatchUpdateContext.AddRange(IEnumerable`1 group, Int32 groupSize)
at Sitecore.ContentSearch.SolrProvider.SolrBatchUpdateContext.AddDocument(Object itemToAdd, IExecutionContext[] executionContexts)
at Sitecore.ContentSearch.SolrProvider.SolrIndexOperations.ApplyPermissionsThenIndex(IProviderUpdateContext context, IIndexable version)
at Sitecore.ContentSearch.SitecoreItemCrawler.DoAdd(IProviderUpdateContext context, SitecoreIndexableItem indexable)
at Sitecore.ContentSearch.HierarchicalDataCrawler`1.CrawlItem(T indexable, IProviderUpdateContext context, CrawlState`1 state)
Nested Exception
Exception: System.Net.WebException
Message: The remote server returned an error: (400) Bad Request.
Source: System
at System.Net.HttpWebRequest.GetResponse()
at HttpWebAdapters.Adapters.HttpWebRequestAdapter.GetResponse()
at SolrNet.Impl.SolrConnection.GetResponse(IHttpWebRequest request)
at SolrNet.Impl.SolrConnection.PostStream(String relativeUrl, String contentType, Stream content, IEnumerable`1 parameters)
It looks like when Solr threw the error, any other items in the same batch were ignored.
After some investigations, I realized that Sitecore is using a calculated field to index the coordinate data, and so overriding it with additional validation should be be fairly straight forwards.
public class CoordinateValidationComputedIndex : AbstractComputedIndexField
{
private static readonly ILog Logger = LogManager.GetLogger("Sitecore.Diagnostics.Crawling") ?? LoggerFactory.GetLogger(typeof(CrawlingLog));
public override object ComputeFieldValue(IIndexable indexable)
{
Item obj = indexable as SitecoreIndexableItem;
if (obj == null || !obj.Fields.Contains(new ID(Constants.Latitude)) ||
!obj.Fields.Contains(new ID(Constants.Longitude)))
{
return null;
}
if (!double.TryParse(obj[new ID(Constants.Latitude)], NumberStyles.Any, CultureInfo.InvariantCulture,
out var lat) || !double.TryParse(obj[new ID(Constants.Longitude)], NumberStyles.Any,
CultureInfo.InvariantCulture, out var lon))
{
return null;
}
//Latitude Check -90 - +90
//Longitude Check -180 - +180
if (lat < -90 || lat > 90 || lon < -180 || lon > 180)
{
Logger.Warn(
$"Coordinate validation failed for {obj.ID.Guid:B}:{obj.Paths.FullPath}\n\rWith value of latitude: {lat}, longitude: {lon}\n\r Latitude should be in range -90 to +90\n\r Longitude should be in range -180 to +180");
return null;
}
return new Coordinate(lat, lon).ToString();
}
}
I then created the following patch file to override the OTB configuration.
<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/" xmlns:role="http://www.sitecore.net/xmlconfig/role/" xmlns:env="http://www.sitecore.net/xmlconfig/env/">
<sitecore>
<contentSearch>
<indexConfigurations>
<defaultSolrIndexConfiguration type="Sitecore.ContentSearch.SolrProvider.SolrIndexConfiguration, Sitecore.ContentSearch.SolrProvider">
<documentOptions type="Sitecore.ContentSearch.SolrProvider.SolrDocumentBuilderOptions, Sitecore.ContentSearch.SolrProvider">
<fields hint="raw:AddComputedIndexField">
<field patch:instead="*[@fieldName='coordinate']" fieldName="coordinate" returnType="coordinate" >zzz.Feature.Geolocation.ComputedIndex.CoordinateValidationComputedIndex, zzz.Feature</field>
</fields>
</documentOptions>
</defaultSolrIndexConfiguration>
</indexConfigurations>
</contentSearch>
</sitecore>
</configuration>
And this is now what I get in the logs (no Solr errors), and the correct number of items being indexed.
4176 16:17:47 INFO [Index=sitecore_master_index] Crawler: Processed 5000 items
7988 16:17:47 WARN Coordinate validation failed for {570326e8-7ae6-4f7a-9354-62670d99c199} : ***Item Path Removed*** with value of latitude:-95.67611, longitude:-95.67611 -
Latitude should be in range -90 to +90
Longitude should be in range -180 to +180
6564 16:17:51 INFO [Index=sitecore_master_index] Crawler: Processed 6000 items