How do I resolve the manual snapshot error in my Amazon OpenSearch Service cluster?

7 minute read
0

I want to restore a manual snapshot of my Amazon OpenSearch Service cluster. However, I receive an error when I try to register a repository or access a registered repository. Why is this happening and how do I resolve this?

Short description

To successfully migrate data from a manual snapshot in OpenSearch Service, perform the following steps:

1.    Choose an Amazon Simple Storage Service (Amazon S3) bucket where you want to store your snapshot.

2.    Register the Amazon S3 bucket with your OpenSearch Service source cluster.

3.    Take a snapshot of the OpenSearch Service source cluster, and then store it in your Amazon S3 bucket.

4.    Register your destination cluster with the same Amazon S3 bucket to make sure that you can view the manual snapshot.

5.    Restore the manual snapshot on the destination cluster in OpenSearch Service.

Otherwise, you might encounter one of the following issues:

  • 403 Unauthorized error
  • repository_missing_exception
  • concurrent_snapshot_execution_exception
  • snapshot_restore_exception
  • a_w_s_security_token_service_exception
  • "PARTIAL" snapshot status
  • Amazon S3 Glacier storage class issue

Resolution

403 Unauthorized error

If you activated fine-grained access control (FGAC) on your OpenSearch Service domain, you might receive the following error when you take a snapshot:

{
    "error": {
        "root_cause": [{
            "type": "security_exception",
            "reason": "no permissions for [cluster:admin/repository/put] and User [name=arn:aws:iam::012345678912:user/username, backend_roles=[], requestedTenant=null]"
    }],
        "type": "security_exception",
        "reason": "no permissions for [cluster:admin/repository/put] and User [name=arn:aws:iam::012345678912:user/username, backend_roles=[], requestedTenant=null]"
    },
    "status": 403
}

To resolve the 403 Unauthorized error, make sure to specify a username:password parameter whenever you take a manual snapshot:

curl -XPUT -u 'opensearch-domain-endpoint/_snapshot/snapshot-repository-name/snapshot-name'

Note: You must be a superuser to activate fine-grained access control for your OpenSearch Service domain. You can either use your superuser name and password or set an AWS Identity Access Management (IAM) role as the superuser. When you access your cluster snapshot, specify your superuser credentials or IAM role. If you specify an IAM role, the IAM role must sign the HTTP requests using sigv4. For more information about using fine-grained access control and IAM roles, see Creating and managing OpenSearch Service domains.

You must also register a snapshot repository with your snapshot, and map the manage_snapshots role to an IAM role. The manage_snapshots role must have proper permissions (IAM:PassRole) to assume the IAM role (TheSnapshotRole). For more information, see Manual snapshot prerequisites.

To map the manage_snapshots role to an IAM role, perform the following steps:

1.    Open the OpenSearch Dashboards console.

2.    Log in as a primary user.

3.    Choose Security.

4.    Choose Roles.

5.    Choose manage_snapshots as your role.

6.    Choose Mapped users.

7.    Choose Manage mapping.

8.    Under Users, add your user ARN (for example: "arn:aws:iam::012345678912:user/username").

9.    Register your manual snapshot repository.

Repository_missing_exception

Before you take a manual index snapshot, you must register a manual snapshot repository with OpenSearch Service. Your IAM role (TheSnapshotRole) must also be set up to work with Amazon S3.

If you didn't register your snapshot repository before taking a manual snapshot, or you use an incorrect repository name, you receive the following error:

{
    "error": {
        "root_cause": [{
            "type": "repository_missing_exception",
            "reason": "[snapshot-repository-name] missing"
        }],
        "type": "repository_missing_exception",
        "reason": "[snapshot-repository-name] missing"
    },
    "status": 404
}

To resolve this error, make sure that you meet the manual snapshot prerequisites. Also, make sure that you check for typos in the repository name.

Concurrent_snapshot_execution_exception

If a snapshot is currently in progress, you receive the following error when you try to take another snapshot:

{
    "error": {
        "root_cause": [{
            "type": "concurrent_snapshot_execution_exception",
            "reason": "[snapshot-repository-name:snapshot-name] a snapshot is already running"
        }],
        "type": "concurrent_snapshot_execution_exception",
        "reason": "[snapshot-repository-name:snapshot-name] a snapshot is already running"
    }
}

To check if there is another snapshot in progress, run the following command:

curl -XGET 'opensearch-domain-endpoint/_snapshot/_status'

If a snapshot is already in progress, wait for the current snapshot to complete. Or, if you suspect that your snapshot is stuck, check your history of hourly snapshots. For more information, see Why can't I delete an index or upgrade my OpenSearch Service cluster?

Snapshot_restore_exception

If you try to migrate data from an on-premises cluster to an OpenSearch Service domain, you might encounter the following exception:

{
    "error": {
        "root_cause": [{
            "type": "snapshot_restore_exception",
            "reason": "[manual-snapshot-repo:my-manual-snapshot1/HPOcIJryTj6a6GJvyP79bw] the snapshot was created with Elasticsearch version [6.8.0] which is higher than the version of this node [6.7.0]"
        }],
        "type": "snapshot_restore_exception",
        "reason": "[manual-snapshot-repo:my-manual-snapshot1/HPOcIJryTj6a6GJvyP79bw] the snapshot was created with Elasticsearch version [6.8.0] which is higher than the version of this node [6.7.0]"
    },
    "status": 500
}

This error message occurs when a snapshot taken on an existing cluster runs on a different version of Elasticsearch than OpenSearch Service. If your cluster is running on an earlier version of Elasticsearch than OpenSearch Service is, consider upgrading your Elasticsearch version. Or, you can use the remote reindex API to migrate your indices.

a_w_s_security_token_service_exception

If the IAM role associated with your manual snapshot doesn't have a trust relationship established for "es.amazonaws.com", you receive the following exception:

{
    "error": {
        "root_cause": [{
            "type": "repository_exception",
            "reason": "[es_01082021_repo] Could not determine repository generation from root blobs"
        }],
        "type": "repository_exception",
        "reason": "[es_01082021_repo] Could not determine repository generation from root blobs",
        "caused_by": {
            "type": "i_o_exception",
            "reason": "Exception when listing blobs by prefix [index-]",
            "caused_by": {
                "type": "a_w_s_security_token_service_exception",
                "reason": "a_w_s_security_token_service_exception: User: arn:aws:sts::332315457451:assumed-role/cp-sts-grant-role/swift-us-west-2-prod-679203657591 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::679203657591:role/ES_Backup_Role (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: 36d09b93-d94f-457e-8fa5-b0a50ba436c3)"
            }
        }
    },
    "status": 500
}

With OpenSearch Service snapshots, an internal role is created (such as arn:aws:sts::332315457451:assumed-role/cp-sts-grant-role/swift-us-west-2-prod-679203657591). This internal role assumes the IAM role associated with the manual snapshot, and then performs any required operations.

To resolve the security token exception, make sure to specify the IAM role associated with the manual snapshot. If you don't have an IAM role associated with the manual snapshot, then you must create one. For more information, see Manual snapshot prerequisites.

Also, check the trust relationship for the IAM role associated with the manual snapshot. The trust relationship for the role must specify OpenSearch Service in the Principal statement, like this:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "",
        "Effect": "Allow",
        "Principal": {
            "Service": "es.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
    }]
}

PARTIAL snapshot status

A snapshot enters a "PARTIAL" state for the following reasons:

A partial snapshot indicates that data from a shard couldn't be stored. You can still restore data from a partial snapshot, but you must use earlier snapshots to restore any missing indices. To check whether your cluster has entered a "PARTIAL" state, check your snapshot history. For more information, see Restoring snapshots.

Amazon S3 Glacier storage classes issue

If you're storing a restored snapshot in one of the Amazon S3 Glacier storage classes, avoid applying an Amazon S3 Glacier Lifecycle rule to the bucket. Manual snapshots don't support the Amazon S3 Glacier storage classes. Therefore, if you apply an Amazon S3 Glacier Lifecycle policy to the S3 bucket, you must move back any objects that transition over.

After you move the objects back to a standard Amazon S3 storage class, you can restore the objects from those snapshots. For more information, see Manual snapshot prerequisites.


Related information

How do I resolve the "cannot restore index [.kibana] because it's open" error in Amazon OpenSearch Service?

Taking manual snapshots

AWS OFFICIAL
AWS OFFICIALUpdated a year ago