boto3 occasionally raises CredentialRetrievalError when boto3.client() is called from a Greengrass component using TokenExchangeService

0

Hello! I'm having some trouble with a Greengrass component that occasionally crashes shortly after startup. The component connects to S3 using boto3, using the device's credentials - the appropriate S3 permissions are in GreengrassV2TokenExchangeRole. In the component's recipe, a dependency on TokenExchangeService is specified:

  aws.greengrass.TokenExchangeService:
    VersionRequirement: '^2.0.0'
    DependencyType: HARD

The relevant part of the code looks like this:

s3_connection_config = botocore.config.Config(
    retries = {
        'max_attempts': 10,
        'mode': 'standard'
   }
)
print("Creating boto3 S3 client...")
s3_client = boto3.client('s3', config=s3_connection_config)

The retry configuration applied with s3_connection_config does not have any effect on the error that is thrown upon calling boto3.client() - the script instead crashes, and Greengrass restarts the component until it has crashed enough times to be considered "broken". We could create our own retry mechanism with try-except, but is this the right way to go? Is this a bug in TokenExchangeService?

The logs follow below:

2023-04-12T08:52:06.864Z [INFO] (Copier) FirmwareCourier: stdout. Creating boto3 S3 client.... {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.668Z [WARN] (Copier) FirmwareCourier: stderr. Traceback (most recent call last):. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.669Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/credentials.py", line 1985, in fetch_creds. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.670Z [WARN] (Copier) FirmwareCourier: stderr. response = self._fetcher.retrieve_full_uri(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.670Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/utils.py", line 2861, in retrieve_full_uri. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.671Z [WARN] (Copier) FirmwareCourier: stderr. return self._retrieve_credentials(full_url, headers). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.671Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/utils.py", line 2897, in _retrieve_credentials. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.672Z [WARN] (Copier) FirmwareCourier: stderr. return self._get_response(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.672Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/utils.py", line 2919, in _get_response. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.673Z [WARN] (Copier) FirmwareCourier: stderr. raise MetadataRetrievalError(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.674Z [WARN] (Copier) FirmwareCourier: stderr. botocore.exceptions.MetadataRetrievalError: Error retrieving metadata: Received non 200 response (500) from ECS metadata: Failed to get connection. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.674Z [WARN] (Copier) FirmwareCourier: stderr. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.675Z [WARN] (Copier) FirmwareCourier: stderr. During handling of the above exception, another exception occurred:. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.675Z [WARN] (Copier) FirmwareCourier: stderr. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.675Z [WARN] (Copier) FirmwareCourier: stderr. Traceback (most recent call last):. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.676Z [WARN] (Copier) FirmwareCourier: stderr. File "/greengrass/v2/packages/artifacts-unarchived/FirmwareCourier/1.0.138/FirmwareCourier/main.py", line 281, in <module>. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.676Z [WARN] (Copier) FirmwareCourier: stderr. main(). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.676Z [WARN] (Copier) FirmwareCourier: stderr. File "/greengrass/v2/packages/artifacts-unarchived/FirmwareCourier/1.0.138/FirmwareCourier/main.py", line 253, in main. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.676Z [WARN] (Copier) FirmwareCourier: stderr. s3_client = boto3.client('s3', config=s3_connection_config). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.677Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/boto3/__init__.py", line 92, in client. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.677Z [WARN] (Copier) FirmwareCourier: stderr. return _get_default_session().client(*args, **kwargs). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.677Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/boto3/session.py", line 299, in client. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.678Z [WARN] (Copier) FirmwareCourier: stderr. return self._session.create_client(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.678Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/session.py", line 951, in create_client. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.678Z [WARN] (Copier) FirmwareCourier: stderr. credentials = self.get_credentials(). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.679Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/session.py", line 507, in get_credentials. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.679Z [WARN] (Copier) FirmwareCourier: stderr. self._credentials = self._components.get_component(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.679Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/credentials.py", line 2095, in load_credentials. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.680Z [WARN] (Copier) FirmwareCourier: stderr. creds = provider.load(). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.680Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/credentials.py", line 1958, in load. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.680Z [WARN] (Copier) FirmwareCourier: stderr. return self._retrieve_or_fail(). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.681Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/credentials.py", line 1967, in _retrieve_or_fail. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.681Z [WARN] (Copier) FirmwareCourier: stderr. creds = fetcher(). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.682Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/credentials.py", line 1992, in fetch_creds. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.689Z [WARN] (Copier) FirmwareCourier: stderr. raise CredentialRetrievalError(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.691Z [WARN] (Copier) FirmwareCourier: stderr. botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received non 200 response (500) from ECS metadata: Failed to get connection. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.947Z [INFO] (Copier) FirmwareCourier: Run script exited. {exitCode=1, serviceName=FirmwareCourier, currentState=RUNNING}
asked a year ago253 views
2 Answers
0
Accepted Answer

The reported error is a 500 error. Open the greengrass log file to see the problem (/greengrass/v2/logs/greengrass.log).

If you see occasional issues, then yes, you should retry using try-except.

AWS
EXPERT
answered a year ago
0

Thanks for the guidance! I found the following in GreengrassSystemComponent/<region>/System in Cloudwatch, which I believe is equivalent to greengrass.log.

2023-04-12T08:52:02.965Z [WARN] (pool-2-thread-28) com.aws.greengrass.tes.CredentialRequestHandler: Encountered error while fetching credentials. {iotCredentialsPath=/role-aliases/GreengrassV2TokenExchangeRoleAlias/credentials}
2023-04-12T08:52:02.972Z [ERROR] (pool-2-thread-28) com.aws.greengrass.tes.CredentialRequestHandler: Error in retrieving AwsCredentials from TES. {iotCredentialsPath=/role-aliases/GreengrassV2TokenExchangeRoleAlias/credentials, credentialData=Failed to get connection}

Is this expected behaviour if the device is unable to communicate with AWS services, e.g. because an internet connection has not yet been established?

answered a year ago
  • Yes, that certainly makes sense. Credentials come from AWS, so we do need a connection to AWS in order to get them and then give it to your component.

  • Great, thank you!

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions