내용으로 건너뛰기

Opensearch: Replica shard stuck in state: `INITIALIZING` and unassigned.reason: `ALLOCATION_FAILED`

0

A replica shard of an index is stuck in state INITIALIZING and unassigned.reason: ALLOCATION_FAILED since a few days. Tried doing

POST /_cluster/reroute?retry_failed=true

through kibana and I get

{"Message":"Your request: '/_cluster/reroute' is not allowed."}

Upon looking, I realised that AWS doesn't allow _cluster/reroute. How to resolve this and get the shard allocated ?

질문됨 일 년 전811회 조회
1개 답변
0

To resolve the issue of a replica shard stuck in the INITIALIZING state with an ALLOCATION_FAILED reason in Amazon OpenSearch Service, you'll need to take a different approach since AWS doesn't allow direct use of the _cluster/reroute API. Here are some steps you can take to address this problem:

  1. Check cluster health: First, ensure that your cluster has sufficient resources. Low disk space or high CPU utilization can cause shard allocation failures. Monitor these metrics using CloudWatch alarms.

  2. Verify instance types: Make sure you're using the appropriate instance types for your workload. For dedicated master nodes, choose the right instance type based on the number of data nodes in your cluster.

  3. Review shard allocation: Each node should have fewer than 25 shards per GiB of Java heap memory. In OpenSearch Service, heap memory is half of the instance memory, up to 32 GB maximum. Check if you need to adjust your index settings or increase your instance sizes.

  4. Increase storage: Ensure there's sufficient storage across cluster nodes. Low disk space can cause OpenSearch Service to unassign shards and rebalance them, leading to allocation failures.

  5. Force merge: If the index with the stuck shard has many segments, try running a force merge operation to reduce the number of segments.

  6. Snapshot and restore: If the above steps don't resolve the issue, consider taking a snapshot of your index, deleting the problematic index, and then restoring it from the snapshot. This process can often resolve shard allocation issues.

  7. Contact AWS Support: If none of these steps resolve the issue, reach out to AWS Support for further assistance, as they may be able to perform operations that are not directly available to users.

Remember, while AWS OpenSearch Service doesn't allow direct use of the _cluster/reroute API, it does provide automated management of many cluster operations. The service continuously monitors cluster health and attempts to resolve issues automatically, including shard allocation problems.

By following these steps, you should be able to address the stuck replica shard issue without needing direct access to the _cluster/reroute API.
Sources
Troubleshoot long-running or stuck snapshots in OpenSearch Service | AWS re:Post
Open-Source Search Engine - Amazon OpenSearch Service Managed Service - AWS

답변함 일 년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

관련 콘텐츠