How can I resolve the AWS STS AssumeRoleWithWebIdentity API call error "InvalidIdentityToken"?

4 minute read
0

The AWS Security Token Service (AWS STS) API call AssumeRoleWithWebIdentity failed with an "InvalidIdentityToken" error.

Short description

If your AssumeRoleWithWebIdentity API call fails, then you might receive an error that's similar to the following message:

"An error occurred (InvalidIdentityToken) when calling the AssumeRoleWithWebIdentity operation. Couldn't retrieve verification key from your identity provider."

This error might occur in the following scenarios:

  • The .well_known URL and jwks_uri of the identity provider (IdP) are inaccessible from the public internet.
  • A custom firewall is blocking the requests.
  • There's latency of more than 5 seconds in API requests from the IdP to reach the AWS STS endpoint.
  • STS is making too many requests to your .well_known URL or the jwks_uri of the IdP.

Note: Because this issue fails on the client side, AWS CloudTrail event history doesn't log this error.

Resolution

Verify public access for .well_known and jwks_uri

Verify that the .well_known URL and jwks_uri of the IdP are publicly accessible. This can be checked using your browser, Windows command, or Linux command. To do this, complete one of the following actions:

To check access, navigate to the following links in your browser:

  • https://BASE_SERVER_URL/.well-known/openid-configuration
  • https://BASE_SERVER_URL/.well-known/jwks.json

-or-

Run the following commands:

Windows:

wget https://BASE_SERVER_URL/.well-known/openid-configuration
wget https://BASE_SERVER_URL/.well-known/jwks.json

Linux:

curl https://BASE_SERVER_URL/.well-known/openid-configuration
curl https://BASE_SERVER_URL/.well-known/jwks.json

Note: To confirm if you can access the links, check for the 200 status code in the request response.

Check firewall settings

If the .well_known URL and jwks_uri of the IdP aren't accessible, then check the firewall settings. Make sure that the domains aren't on a deny list.

Depending on the current configuration of the firewall, the domains might need to be added to an allow list.

If the firewall settings aren't accessible, then use the browser with a device from a different network, such as a phone. To check access from the browser, use the instructions in step 1. If the web request succeeds, then the firewall is blocking the request.

If the server that's making the AssumeRoleWithWebIdentity API call is an Amazon Elastic Compute Cloud (Amazon EC2) instance, then check the configuration settings. For instructions, see Why can't I connect to a website that is hosted on my EC2 instance?

Check operation latency

Check the latency for the total operation. This includes the following attributes:

  • Request/Response time from STS
  • Request/Response time from IdP

Minimize STS latency

Use AWS Regional endpoints instead of global endpoints for the STS service. This verifies that the requests are routed to the geographically closest server to minimize latency. For more information, see Writing code to use AWS STS Regions.

Note: For AWS SDKs, the Region parameter routes the request's destination endpoint to where the call is made within the sts_regional_endpoint configuration.

Evaluate IdP latency

The IdP makes requests to the STS endpoint. To check if the request to the STS endpoint takes too long, analyze the IdP's outgoing packets within the IdP logs.

Note: If the request from the IdP to the STS endpoint takes more than 5 seconds, then the request might time out and fail. You can contact your identity provider to request an increase for geographical availability to reduce latency for this API call.

(Optional) Use exponential backoff and increase retries

The AssumeRoleWithWebIdentity API depends on retrieving information from the identity provider (IdP). To avoid throttling errors, most IdPs have API limits, and API calls might not get the required keys back from the IdP. To help successfully assume a role if the API has intermittent issues reaching your IdP, take the following steps:

Reduce STS requests to .well_known and jwks_uri

If your JSON Web Key Set (JWKS) sets either Pragma: no-cache or Cache-Control: no-cache response headers, then STS doesn't cache your JWKS. For keys that are referenced in an ID_TOKEN but aren't in the cache, STS performs a callback. In this case, STS might make too many requests to your .well_known URL and jwks_uri.

Therefore, to reduce callbacks from STS, verify that your JWKS doesn't set either of these response headers. This allows STS to cache your JWKS.

Related information

Welcome to the AWS Security Token Service API Reference

How can I resolve API throttling or "Rate exceeded" errors for IAM and AWS STS?

AWS OFFICIAL
AWS OFFICIALUpdated a year ago
4 Comments

The documentation above makes it sound like /.well-known/jwks.json is a standard location but it really should specify the ”jwks_uri” value from .well-known/openid-configuration since the OIDC metadata is free to specify another endpoint, such as the oauth/discovery/keys path GitLab uses.

acdha
replied 3 months ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 3 months ago

Another cause here is not matching the expected audience in an identity provider policy when attempting to use a role with a trust relationship with that identity provider. Steps to reproduce:

  1. Create a role under IAM > Roles that with a trust relationship with an external identity provider, for example:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "accounts.google.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "accounts.google.com:sub": "<sub>"
                }
            }
        }
    ]
}
  1. At this point the role should work with AssumeRoleWithWebIdentity if you provide a token with the expected sub.
  2. Add the same principal under IAM > Identity providers, for example:
provider type: OpenID Connect
Provider URL: https://accounts.google.com
Audience: empty
  1. At this point calls to AssumeRoleWithWebIdentity will fail with InvalidIdentityToken and an error message complaining about invalid audience.

The fix is to add the audience you're using to the IdentityProvider. This was pretty surprising to us and took a while to figure out since the trust policy hadn't changed. Also writing code to handle these two cases differently is difficult because the same error is returned for both retriable and permanent errors.

replied 7 days ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied 7 days ago