ElasticSearch scaling and multi-tenancy considerations

0

Hello, A customer is growing their usage of Elastic and adding more customers, types of documents and indexes. Elastic is a main part of their multi-tenant SaaS offering.

TL;DR - They would like to transform their Elastic setup to be multi-tenant in order to create better isolation and accommodate the expected growth and have a couple of questions.

They consider a couple of points regarding going forward:

1. Currently each Elastic index contains the documents of all of their tenants. However, moving forward to new indexes, they consider creating a separate index per tenant. According to plan, they may have millions of documents per tenant. For example, an index for all emails, which is currently called ‘emailMessage’ will be split into many ‘emailMessage-TENANTID’.

  • What does it mean from system resources point of view in case they expect to have a few thousand tenants? Since each index requires at least one separate shard and each shard means system resources they are not sure if they'll will not hit some system limit at some point which will prevent them from adding additional tenants.

Customer wording regarding two additional questions -

2. How bad / good does ES handle modifications – in one of our planned indices we expect to store from hundreds of thousands up to few millions of documents per tenant. We are also expecting that about 50% of them will be changed on a daily basis. Since ES basically deletes a document on each update we are worried that ES indices will and data will get fragmented, which will cause performance decrease. The question is if you have an experience with ES indices that take so many updates and how do they perform and / or is there any actions we should take when creating the indices as well as maintaining them to keep ES perform well as time goes by.

3. API – we are currently using ES’s RestHighLevelClient and we experience issues with keeping up with the latest versions of it – as it turns up the developers of this client does not value backward compatibility too much and some upgrades require a few days of development and testing for keeping the existing code work as it did before the upgrade. The question is if you have any recommendations for alternate (Java) client with which you have good experience.

*The customer use Amazon ElasticSearch service.

AWS
asked 5 years ago1157 views
1 Answer
1
Accepted Answer

Hello, please see my thoughts below:

  1. Elasticsearch is designed to scale horizontally. They'd need to keep on eye on the monitoring but they can simply add additional nodes to the cluster should resources start becoming strained. Overtime they'll learn how big their indexes and shards are and be able to calculate if they need additional nodes when adding a new tenant. See this blog re petabyte scale https://aws.amazon.com/blogs/database/run-a-petabyte-scale-cluster-in-amazon-elasticsearch-service/ it recommends no more than 30,000 shards in a cluster.

  2. ES isn't really designed for modifications. There are two things to consider: a. The total number of documents being indexed per second - the modification will count as a reindex. If the cluster is struggling with indexing again Elasticsearch can scale horizontally so this isn't an issue. b. Performance. Elasticsearch doesn't delete from the index when a document is deleted. I've previously had indexes of 10s of millions of records with 10s of millions of deleting and haven't noticed much of a performance issue. We rearchitected the solution to time based indexes so we had less deleted docs per index. This also allowed us to run force merge on the older indicies once they weren't being written to any longer. DO NOT RUN FORCE MERGE ON AN INDEX THAT IS STILL BEING WRITTEN TO. So it is probably worth a conversation about the data and whether it fits nicely in time or size based indexes e.g. emailMessage-TenantId20190527 and rolls weekly, or emailMessageTenantId0001 and it rolls when it reaches a certain size.

  3. Unfortunately not on this. However whilst talking of tools this tool is quite useful for managing indexes https://www.elastic.co/guide/en/elasticsearch/client/curator/current/index.html

Hope that helps.

AWS
Rob_C
answered 5 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions