We had many reasons to migrate BMJ Best Practice, one of our biggest Compass-based legacy project to Elasticsearch.
- Compass and Elasticsearch were created by the same person, Shay Banon, and he publicly recommends to migrate [here].
- Compass was giving us some errors on server start-up related to the spellcheck indexes that we had to manually recreate from time to time.
- We had local copies of the indexes in each server. That is multiple copies to maintain. With Elasticsearch, we have a separate cluster, that is independent of the application code and that can be seen as a search-as-service feature out of the box.
- The Elasticsearch community is great, with lots of plugins and extensions in case you need something special, like processing complex Chinese, which is something we need.
- Other services can query our Elasticsearch cluster to search the same information in different ways, with different boosting or query types.
We had previous experience in the department with Elasticsearch 1.x, but not with the newer version of the technology. We needed our DevOps team to prepare a Puppet script to provision some boxes for the cluster, and another one with Java 8 for our indexer service.
See the final solution below.
Some of the main challenges were:
- Compass has out-of-the-box support for XSEM (XML to Search Engine Mapping), and Elasticsearch does not. All our data is in XML format in an xDB. This means that for Elasticsearch we had to rewrite the mappings we previously had in XML and create some Java classes instead that are then serialised into JSON to be sent to Elasticsearch for processing.
- We wanted to remove every indexing responsibility from our application. Therefore we created a separate indexer service that is called from Best Practice, although it can be called on its own as it exposes an API to trigger the indexing. This means our big monolith is now a little smaller.
- At the same time we were able to remove all Compass related code that we had in Best Practice to not only manage the indexes but also the search. Now searching is just a remote call to the Elasticsearch cluster that is done using the official Elasticsearch Java client. So we can remove a fair amount of search related code. Again, our monolith shrinks.
- We wanted both solutions to coexist for a while. During that period we performed some A/B testing with domain experts to make sure our search results were accurate and safe for patients. To achieve this we used feature toggles, that were removed when they were no longer needed.
- Finding the right type of query to get the desired results can be quite tricky. We spent a considerable amount of time tuning the boosting and experimenting with different query types and parameters.
- Multilingual scenarios are particularly hard. We have one index per language, and we support many different languages. This implies separate custom analysers, separate mappings, and specific synonyms for Chinese, Spanish, Portuguese, English and more.
The main benefits were:
- Search results are easier to tune.
- Application code is simpler and easier to maintain.
- Separation of responsibilities that leads to a cleaner design.
- Service oriented architecture where each service can scale independently.
- Potential new uses as other services can now search the same content in different ways.
- New skills acquired within the department.
Check out our new search here, and thanks for reading.