MongoDB and ElasticSearch
MongoDB and ElasticSearch were designed to meet very different needs. Originally Mongo was intended to address data storage issue in a big data context, while ElasticSearch (ES) provided indexation and searches within the databases. Over time, the evolution of these two products gave them certain similarities to the point where they started to be found in comparable use cases. However their differences in design continue to have consequences on many of their characteristics.
MongoDB stores information in the form of JSON objects called Documents. No schema is imposed; each document may potentially be structured differently from the preceding one. This characteristic gives developers great flexibility, as much in the application’s design phases as in future developments.
There is no such flexibility in ElasticSearch. Although data are also stored in the form of JSON objects, ES expects a schema which describes them. If such a schema is not supplied, one is automatically created (with inherent risks of error) when it receives the first document. The following documents have to conform to that schema, or else are rejected. Additions to the JSON object are always possible afterwards, but any change is much more complicated. In fact, the schema is used for indexing data, and indexation is done only when they are inserted into the database. A change of schema requires full regeneration of the index, which amounts to a massive export/import of all concerned data in the database.
The speeds of database insertion are comparable, as are speeds for querying modest quantities of data. However, Mongo DB response times rise with the volume of data searched, while ElasticSearch times remain stable. This is the other side of the coin of MongoDB flexibility. Because of the schema that ElastiSearch imposes on inserted data, indexation is much more effective – an advantage that appears as soon as volumes become significant. This property explains its use in BI-like contexts, as found in the increasingly popular combination
ElasticSearch-Logstash-Kibana or ElasticSearch-Curiosity developed by the Yellow Pages.
Interaction with ElasticSearch is carried out through a REST API. Security is ensured with a paid plug-in. MongoDB has an access check managed by users and roles. The access protocol is its own, but drivers are available for all of the usual languages.
Each of the two systems was planned to ensure horizontal scalability, so that an arbitrarily large quantity of data can be stored. In both cases, this scalability is ensured by a sharding mechanism (data storage distributed across several nodes) and a replication mechanism (integral copy of data on several nodes). Building a MongoDB cluster requires 3 different types of server: one containing the configuration, the other providing request routing, and lastly those containing the data. ElasticSearch covers over this complexity, as each data server (ES nodes) handles all functions.
In the case of MongoDB, the DBA must choose a key to be used for breaking down data into shards. This key, which corresponds to an element which all of the documents have in common, decides the storage shard of each new entry. This choice has an impact on the cluster’s efficiency and balance. ElasticSearch can automate these choices. Creating an ES cluster boils down to indicating the ideal number of shards and replicates for a given index. ES then scales them out to available nodes, each new node having been detected and added to the cluster. Scaling is therefore particularly simple to implement and maintain.
Lucenes, on which ElasticSearch is based, does not create a checksum upon writing data to the disk. The data may therefore be incorrect once it arrives on the storage medium. Moreover, a lot of manipulations (change in the index, in the number of shards, etc.) mean again reimporting data, which can be an unwieldy task for big data applications. It is the author’s view that ElasticSearch is not yet ready for use as main storage for data. However it does have certain advantages for searching information within a massive database. You would be better off keeping MongoDB for the role of datastore, and use it to populate ElasticSearch with the data you want to mine.