How do I use CloudWatch alarms to monitor my OpenSearch Service cluster?

3 minute read
0

I want to monitor my Amazon OpenSearch Service cluster for stability issues.

Resolution

Important: Different versions of Elasticsearch use different thread pools to process calls to the _index API.

  • Elasticsearch versions 1.5 and 2.3 use the index thread pool.
  • Elasticsearch versions 5.x, 6.0, and 6.2 use the bulk thread pool. Note that the OpenSearch Service console doesn't include a graph for the bulk thread pool.
  • Elasticsearch versions 6.3 and later use the write thread pool.

To monitor the health of your OpenSearch Service cluster, turn on the recommended Amazon CloudWatch alarms. Also, turn on the following OpenSearch Service cluster metric alarms:

  • MasterReachableFromNode
  • OpenSearchDashboardsHealthyNodes (for OpenSearch cluster only)
  • KibanaHealthyNodes (for Elasticsearch cluster only)
  • DiskQueueDepth
  • ThreadpoolIndexQueue
  • ThreadpoolSearchQueue

Example configuration for OpenSearch Service metric alarms:

MasterReachableFromNode:Statistic = Maximum
Value = '=0'
Frequency = 1 period
Period = 1 minute
Issue: Leader node is down.

OpenSearchDashboardsHealthyNodes / KibanaHealthyNodes:
Statistic = Average
Value = '=0'
Frequency = 1 period
Period = 1 minute
Issue: Indicates that the .kibana index is unhealthy.

DiskQueueDepth:
Statistic = Average
Value = '>=100'
Frequency = 1 period
Period = 5 minutes
Issue: Disk Queue Depth is the number of I/O requests that are queued at a time against the storage. This could indicate a surge in requests or Amazon EBS throttling, resulting in increased latency.

ThreadpoolIndexQueue and ThreadpoolSearchQueue:
Statistic = Maximum
Value = '>=20'
Frequency = 1 period
Period = 1 minute
Issue: Indicates that there are requests getting queued up, which can be rejected. To verify the request status, check the CPU Utilization and Threadpool Index or Search rejects.

To set up a CloudWatch alarm for your OpenSearch Service cluster, complete the following steps:

  1. Open the CloudWatch console.

  2. Choose Alarm.

  3. Choose Create Alarm.

  4. Choose Select Metric, and then select ES/OpenSearchService

  5. Select Per-Domain and Per-Client Metrics.

  6. Select a metric that matches your use case, and then choose Next.

  7. Configure the following settings for your CloudWatch alarm:

    Statistic = MaximumPeriod to 1 minute
    Threshold type = Static
    Alarm condition = Greater than or equal to
    Threshold value = 1
  8. Choose Additional configuration.

  9. Update the following configuration settings:

    Datapoints to alarm = Frequency stated aboveMissing data treatment = Treat missing data as ignore (maintain the alarm state)
  10. Choose Next.

  11. Select the action that you want your alarm to take, and then choose Next.

  12. Enter a name for your alarm, and then choose Next.

  13. Choose Create Alarm.
    Note: If CPUUtilization or JVMMemoryPressure activate the alarm, then check your CloudWatch metrics to see if the spike coincides with incoming requests. In particular, monitor the following metrics: IndexingRate, SearchRate, and OpenSearchRequests.

Related information

ClusterBlockException

Using Amazon CloudWatch alarms

AWS OFFICIAL
AWS OFFICIALUpdated 2 months ago