Skip to main content

What is Elasticsearch and why is it involved in so many data leaks?

(Image credit: Shutterstock/Bluebay)

The term Elasticsearch is never far away from the news headlines and usually for the wrong reasons. Seemingly every week that goes by brings a new story about an Elasticsearch server that has been breached, often resulting in troves of data being exposed. But why are so many breaches originating from Elasticsearch buckets, and how can businesses that leverage this technology use it to its fullest extent while still preventing a data leak?

To answer these questions, firstly, one must understand what Elasticsearch is. Elasticsearch is an open source search and analytics engine as well as a data store developed by Elastic. 

Regardless of whether an organization has a thousand or a billion discrete pieces of information, by using Elasticsearch, they have the capabilities to search through huge amounts of data, running calculations with the blink of an eye. Elasticsearch is a cloud-based service, but businesses can also use Elasticsearch locally or in tandem with another cloud offering.

About the author

Anna Russell is EMEA VP at comforte AG

Organizations will then use the platform to store all of its information in depositories (also known as buckets), and these buckets can include emails, spreadsheets, social media posts, files – basically any raw data in the form of text, numbers, or geospatial data. As convenient as this sounds, it can be disastrous when mass amounts of data are left unprotected and exposed online. Unfortunately for Elastic, this has resulted in many high-profile breaches involving well-known brands from a variety of industries.

During 2020 alone, cosmetics giant Avon had 19 million records leaked on an Elasticsearch database. Another misconfigured bucket involving Family Tree Maker, an online genealogy service, experienced over 25GB of sensitive data exposed. The same happened with sports giant, Decathlon, which saw 123 million records leaked. Then, more than five billion records were exposed after another Elasticsearch database was left unprotected. Surprisingly, it contained a massive database of previously breached user information from 2012 to 2019. 

From what has been disclosed so far, clearly those who chose to use cloud-based databases must also perform the necessary due diligence to configure and secure every corner of the system. Also, quite clearly, this necessity is often being overlooked or just plain ignored. A security researcher even went to the length to discover how long it would take for hackers to locate, attack, and exploit an unprotected Elasticsearch server which was left purposely exposed online – eight hours was all it took. 

Digital transformation has definitely changed the mindset of the modern business, with cloud seen as a novel technology that must be adopted. While cloud technologies certainly have their benefits, improper use of them has very negative consequences. Failing or refusing to understand the security ramifications of this technology can have a dangerous impact on business. 

As such, it is important to realize that in the case of Elasticsearch, just because a product is freely available and highly scalable doesn’t mean you can skip the basic security recommendations and configurations. Furthermore, given the fact that data is widely hailed as the new gold coinage, demand for monetising up-to-date data has never been greater. Evidently for some organizations, data privacy and security have played second fiddle to profit as they do their utmost to capitalize on the data-gold rush. 

Is there only one attack vector for a server to be breached? Not really. In truth, there are a variety of different ways for the contents of a server to be leaked – a password being stolen, hackers infiltrating systems, or even the threat of an insider breaching from within the protected environment itself. The most common, however, occurs when a database is left online without any security (even lacking a password), leaving it open for anyone to access the data. So, if this is the case, then there is clearly a poor understanding of the Elasticsearch security features and what is expected from organizations when protecting sensitive customer data. This could derive from the common misconception that the responsibility of security automatically transfers to the cloud service provider. This is a false assumption and often results in misconfigured or under-protected servers. Cloud security is a shared responsibility between the organization’s security team and the cloud service provider; however, as a minimum, the organization itself owns the responsibility to perform the necessary due diligence to configure and secure every corner of the system properly to mitigate any potential risks. 

To effectively avoid Elasticsearch (or similar) data breaches, a different mindset to data security is required and one that allows data to be a) protected wherever it may exist, and b) by whomever may be managing it on their behalf. This is why a data-centric security model is more appropriate, as it allows a company to secure data and use it while it is protected for analytics and data sharing on cloud-based resources.

Standard encryption-based security is one way to do this, but encryption methods come with sometimes-complicated administrative overhead to manage keys. Also, many encryption algorithms can be easily cracked. Tokenization, on the other hand, is a data-centric security method that replaces sensitive information with innocuous representational tokens. This means that, even if the data falls into the wrong hands, no clear meaning can be derived from the tokens. Sensitive information remains protected, resulting in the inability of threat actors to capitalise on the breach and data theft.

With GDPR and the new wave of similar data privacy & security laws, consumers are more aware of what is expected when they hand over their sensitive information to vendors and service providers, thus making protecting data more important than ever before. Had techniques like tokenization been deployed to mask the information in many of these Elasticsearch server leaks, that data would have been indecipherable by criminal threat actors—the information itself would not have been compromised, and the organization at fault would have been compliant and avoided liability-based repercussions.

This is a lesson to all of us in the business of working with data - if anyone is actually day-dreaming that their data is safe while “hidden in plain sight” on an “anonymous” cloud resource, the string of lapses around Elasticsearch and other cloud service providers should provide the necessary wake-up call to act now. Nobody wants to deal with the fall-out when a real alarm bell goes off!