Best Practices for ElasticSearch Cluster Failovers

0

Hello All,

A customer is currently using AWS ElasticSearch in order to run their primary search function on their e-commerce website. Currently their queries often run for extended periods of time which ends up putting pressure onto the ES instances themselves, which forces them to crash and reboot.

This causes their websites functionality to be down until the AWS ElasticSearch service reboots the nodes. They are currently working on reducing the query times and have already been in contact with Premium support.

Ideally, I would just like to suggest any alternatives or failover solutions that they could implement until they are able to reduce the violent query requests they receive. I was wondering if the Cross-Cluster functionality could also be used as a backup option? Or perhaps implementing Route 53 Health Checks as well as another solution.

Either way, any feedback or input would be greatly appreciated!

AWS
已提問 4 年前檢視次數 309 次
1 個回答
0
已接受的答案

It sounds like the customer is already addressing the root cause of the problem (long queries), so I would suggest the following improvements/additions (if not already in place):

  1. Query caching. Put Redis on Elasticache in front of Elasticsearch to cache query results. This can be as simple as base64-encoding the full JSON query object to use as the key, with the results as the value. Redis can expire cached objects as appropriate for the query validity (even if TTL is only 30 seconds, it can help enormously in a high-traffic ecommerce site).
  2. Scale ES nodes vertically. ES loves memory and big queries love CPU. Not sure what their cluster looks like, but it sounds like fewer, larger nodes could help.
  3. Rather than cross-cluster search, I'd rather suggest having a hot standby if they really really can't solve the root problem (and caching doesn't help). Route53 could be used to switch over to the hot standby. But this is an expensive option, obviously. And it should not be unnecessary if they right-size their cluster and resolve the query size issues. It feels like they may also have sub-optimal index patterns, document formats etc....?
管理員
已回答 4 年前

您尚未登入。 登入 去張貼答案。

一個好的回答可以清楚地回答問題並提供建設性的意見回饋,同時有助於提問者的專業成長。

回答問題指南