How do I resolve the manual snapshot error in my OpenSearch Service cluster?

8 minute read
0

I tried to restore a manual snapshot for my Amazon OpenSearch Service cluster, but I received an error when trying to register or access a repository.

Short description

To migrate data from a manual snapshot in OpenSearch Service, complete the following steps:

1.    Choose an Amazon Simple Storage Service (Amazon S3) bucket where you want to store your snapshot.

2.    Register the Amazon S3 bucket with your OpenSearch Service source cluster.

3.    Take a snapshot of the OpenSearch Service source cluster, and then store it in your Amazon S3 bucket.

4.    Register your destination cluster with the same Amazon S3 bucket to make sure that you can view the manual snapshot.

5.    Restore the manual snapshot on the destination cluster in OpenSearch Service.

Otherwise, you might encounter one of the following errors:

  • 403 Unauthorized error
  • illegal_state_exception
  • repository_missing_exception
  • concurrent_snapshot_execution_exception
  • snapshot_restore_exception
  • a_w_s_security_token_service_exception
  • "PARTIAL" snapshot status
  • Amazon Simple Storage Service Glacier (Amazon S3 Glacier) storage class issue

Resolution

403 Unauthorized error

If you activated fine-grained access control (FGAC) on your OpenSearch Service domain, then you might receive the following error when you take a snapshot:

{
    "error": {
        "root_cause": [{
            "type": "security_exception",
            "reason": "no permissions for [cluster:admin/repository/put] and User [name=arn:aws:iam::012345678912:user/username, backend_roles=[], requestedTenant=null]"
    }],
        "type": "security_exception",
        "reason": "no permissions for [cluster:admin/repository/put] and User [name=arn:aws:iam::012345678912:user/username, backend_roles=[], requestedTenant=null]"
    },
    "status": 403
}

To resolve the 403 unauthorized error, make sure to specify a username:password parameter when you take a manual snapshot:

curl -XPUT -u 'opensearch-domain-endpoint/_snapshot/snapshot-repository-name/snapshot-name'

Note: You must be a superuser to activate fine-grained access control for your OpenSearch Service domain. You can either use your superuser name and password, or set an AWS Identity Access Management (IAM) role as the superuser. When you access your cluster snapshot, specify your superuser credentials or IAM role. If you specify an IAM role, then the IAM role must use sigv4 to sign the HTTP requests. For more information about using fine-grained access control and IAM roles, see Creating and managing OpenSearch Service domains.

You must also register a snapshot repository with your snapshot, and map the manage_snapshots role to an IAM role. The manage_snapshots role must have valid permissions (IAM:PassRole) to assume the IAM role (TheSnapshotRole). For more information, see Prerequisites.

To map the manage_snapshots role to an IAM role, complete the following steps:

1.    Open the OpenSearch Service console as a primary user.

2.    Choose Security.

3.    Choose Roles.

4.    Choose manage_snapshots as your role.

5.    Choose Mapped users.

6.    Choose Manage mapping.

7.    Under Users, add your user ARN (for example: "arn:aws:iam::012345678912:user/username").

8.    Register your manual snapshot repository.

illegal_state_exception

The following error occurs when you use the Amazon S3 bucket for multiple domains for taking manual snapshots:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_state_exception",
        "reason": "Can't get text on a FIELD_NAME at 1:1838"
      }
    ],
    "type": "illegal_state_exception",
    "reason": "Can't get text on a FIELD_NAME at 1:1838"
  },
  "status": 500
}

To resolve this issue, create a new Amazon S3 bucket and take manual snapshot, or clear all of the data from the existing bucket. 

Repository_missing_exception

Before you take a manual index snapshot, you must register a manual snapshot repository with OpenSearch Service. You must also set up your IAM role (TheSnapshotRole) to work with Amazon S3.

If you didn't register your snapshot repository before taking a manual snapshot, or you use an incorrect repository name, then you receive the following error:

{
    "error": {
        "root_cause": [{
            "type": "repository_missing_exception",
            "reason": "[snapshot-repository-name] missing"
        }],
        "type": "repository_missing_exception",
        "reason": "[snapshot-repository-name] missing"
    },
    "status": 404
}

To resolve this error, make sure that you meet the manual snapshot prerequisites. Also, make sure that there aren't any typos in the repository name.

Concurrent_snapshot_execution_exception

If a snapshot is in progress, then you receive the following error when you try to take another snapshot:

{
    "error": {
        "root_cause": [{
            "type": "concurrent_snapshot_execution_exception",
            "reason": "[snapshot-repository-name:snapshot-name] a snapshot is already running"
        }],
        "type": "concurrent_snapshot_execution_exception",
        "reason": "[snapshot-repository-name:snapshot-name] a snapshot is already running"
    }
}

To check if there's another snapshot in progress, run the following command:

curl -XGET 'opensearch-domain-endpoint/_snapshot/_status'

If a snapshot is already in progress, then wait for the current snapshot to complete. Or, if you think that your snapshot is stuck, then check your history of hourly snapshots. For more information, see Why can't I delete an index or upgrade my OpenSearch Service cluster?

Snapshot_restore_exception

If you try to migrate data from an on-premises cluster to an OpenSearch Service domain, then you might encounter the following exception error:

{
    "error": {
        "root_cause": [{
            "type": "snapshot_restore_exception",
            "reason": "[manual-snapshot-repo:my-manual-snapshot1/HPOcIJryTj6a6GJvyP79bw] the snapshot was created with Elasticsearch version [6.8.0] which is higher than the version of this node [6.7.0]"
        }],
        "type": "snapshot_restore_exception",
        "reason": "[manual-snapshot-repo:my-manual-snapshot1/HPOcIJryTj6a6GJvyP79bw] the snapshot was created with Elasticsearch version [6.8.0] which is higher than the version of this node [6.7.0]"
    },
    "status": 500
}

This error message occurs when a snapshot that you take on an existing cluster runs on a different version of Elasticsearch than OpenSearch Service. If your cluster is running on an earlier version of Elasticsearch than OpenSearch Service, then upgrade your Elasticsearch version. Or, you can use the remote reindex API to migrate your indices.

If the domain's FGAC is activated and you try to restore all the indices from the snapshot, then you might receive a 403 error similar to the following one:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "security_exception",
        "reason" : "no permissions for [] and User [name=RahulSarkar, backend_roles=[], requestedTenant=]"
      }
    ],
    "type" : "security_exception",
    "reason" : "no permissions for [] and User [name=RahulSarkar, backend_roles=[], requestedTenant=]"
  },
  "status" : 403
}

To resolve this error, exclude the security indices similar to the following ones:

curl -XPOST 'username:password' "https://opensearch-domain-endpoint/_snapshot/snapshot-repository/snapshot-id/_restore" -H 'Content-Type: application/json' -d'
{
  "indices": "-.opensearch*,-.opendistro*,-.kibana*"
}'

a_w_s_security_token_service_exception

If the IAM role that's associated with your manual snapshot doesn't have a trust relationship established for "es.amazonaws.com", then you receive the following exception error:

{
    "error": {
        "root_cause": [{
            "type": "repository_exception",
            "reason": "[es_01082021_repo] Could not determine repository generation from root blobs"
        }],
        "type": "repository_exception",
        "reason": "[es_01082021_repo] Could not determine repository generation from root blobs",
        "caused_by": {
            "type": "i_o_exception",
            "reason": "Exception when listing blobs by prefix [index-]",
            "caused_by": {
                "type": "a_w_s_security_token_service_exception",
                "reason": "a_w_s_security_token_service_exception: User: arn:aws:sts::332315457451:assumed-role/cp-sts-grant-role/swift-us-west-2-prod-679203657591 is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::679203657591:role/ES_Backup_Role (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: 36d09b93-d94f-457e-8fa5-b0a50ba436c3)"
            }
        }
    },
    "status": 500
}

With OpenSearch Service snapshots, an internal role is created (such as arn:aws:sts::332315457451:assumed-role/cp-sts-grant-role/swift-us-west-2-prod-679203657591). This internal role assumes the IAM role that's associated with the manual snapshot, and then performs any required operations.

To resolve the security token exception error, make sure to specify the IAM role that's associated with the manual snapshot. If you don't have an IAM role that's associated with the manual snapshot, then create one. For more information, see Prerequisites.

Also, check the trust relationship for the IAM role that's associated with the manual snapshot. The trust relationship for the role must specify OpenSearch Service in the Principal statement, similar to this one:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "",
        "Effect": "Allow",
        "Principal": {
            "Service": "es.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
    }]
}

PARTIAL snapshot status

A snapshot enters a "PARTIAL" state for the following reasons:

A partial snapshot indicates that data from a shard can't be stored. You can still restore data from a partial snapshot, but you must use earlier snapshots to restore any missing indices. To check whether your cluster entered a "PARTIAL" state, check your snapshot history. For more information, see Restoring snapshots.

Amazon S3 Glacier storage classes issue

If you're storing a restored snapshot in one of the S3 Glacier storage classes, then don't apply an Amazon S3 Glacier Lifecycle rule to the bucket. Manual snapshots don't support the S3 Glacier storage classes. Therefore, if you apply an Amazon S3 Glacier Lifecycle policy to the S3 bucket, then you must move back any objects that transition over.

After you move the objects back to a standard Amazon S3 storage class, you can restore the objects from those snapshots. For more information, see Prerequisites

Related information

How do I resolve the "cannot restore index [.kibana]" error in Amazon OpenSearch Service?

Taking manual snapshots

AWS OFFICIAL
AWS OFFICIALUpdated a year ago