boto3 occasionally raises CredentialRetrievalError when boto3.client() is called from a Greengrass component using TokenExchangeService

0

Hello! I'm having some trouble with a Greengrass component that occasionally crashes shortly after startup. The component connects to S3 using boto3, using the device's credentials - the appropriate S3 permissions are in GreengrassV2TokenExchangeRole. In the component's recipe, a dependency on TokenExchangeService is specified:

  aws.greengrass.TokenExchangeService:
    VersionRequirement: '^2.0.0'
    DependencyType: HARD

The relevant part of the code looks like this:

s3_connection_config = botocore.config.Config(
    retries = {
        'max_attempts': 10,
        'mode': 'standard'
   }
)
print("Creating boto3 S3 client...")
s3_client = boto3.client('s3', config=s3_connection_config)

The retry configuration applied with s3_connection_config does not have any effect on the error that is thrown upon calling boto3.client() - the script instead crashes, and Greengrass restarts the component until it has crashed enough times to be considered "broken". We could create our own retry mechanism with try-except, but is this the right way to go? Is this a bug in TokenExchangeService?

The logs follow below:

2023-04-12T08:52:06.864Z [INFO] (Copier) FirmwareCourier: stdout. Creating boto3 S3 client.... {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.668Z [WARN] (Copier) FirmwareCourier: stderr. Traceback (most recent call last):. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.669Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/credentials.py", line 1985, in fetch_creds. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.670Z [WARN] (Copier) FirmwareCourier: stderr. response = self._fetcher.retrieve_full_uri(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.670Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/utils.py", line 2861, in retrieve_full_uri. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.671Z [WARN] (Copier) FirmwareCourier: stderr. return self._retrieve_credentials(full_url, headers). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.671Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/utils.py", line 2897, in _retrieve_credentials. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.672Z [WARN] (Copier) FirmwareCourier: stderr. return self._get_response(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.672Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/utils.py", line 2919, in _get_response. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.673Z [WARN] (Copier) FirmwareCourier: stderr. raise MetadataRetrievalError(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.674Z [WARN] (Copier) FirmwareCourier: stderr. botocore.exceptions.MetadataRetrievalError: Error retrieving metadata: Received non 200 response (500) from ECS metadata: Failed to get connection. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.674Z [WARN] (Copier) FirmwareCourier: stderr. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.675Z [WARN] (Copier) FirmwareCourier: stderr. During handling of the above exception, another exception occurred:. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.675Z [WARN] (Copier) FirmwareCourier: stderr. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.675Z [WARN] (Copier) FirmwareCourier: stderr. Traceback (most recent call last):. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.676Z [WARN] (Copier) FirmwareCourier: stderr. File "/greengrass/v2/packages/artifacts-unarchived/FirmwareCourier/1.0.138/FirmwareCourier/main.py", line 281, in <module>. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.676Z [WARN] (Copier) FirmwareCourier: stderr. main(). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.676Z [WARN] (Copier) FirmwareCourier: stderr. File "/greengrass/v2/packages/artifacts-unarchived/FirmwareCourier/1.0.138/FirmwareCourier/main.py", line 253, in main. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.676Z [WARN] (Copier) FirmwareCourier: stderr. s3_client = boto3.client('s3', config=s3_connection_config). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.677Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/boto3/__init__.py", line 92, in client. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.677Z [WARN] (Copier) FirmwareCourier: stderr. return _get_default_session().client(*args, **kwargs). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.677Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/boto3/session.py", line 299, in client. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.678Z [WARN] (Copier) FirmwareCourier: stderr. return self._session.create_client(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.678Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/session.py", line 951, in create_client. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.678Z [WARN] (Copier) FirmwareCourier: stderr. credentials = self.get_credentials(). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.679Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/session.py", line 507, in get_credentials. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.679Z [WARN] (Copier) FirmwareCourier: stderr. self._credentials = self._components.get_component(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.679Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/credentials.py", line 2095, in load_credentials. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.680Z [WARN] (Copier) FirmwareCourier: stderr. creds = provider.load(). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.680Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/credentials.py", line 1958, in load. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.680Z [WARN] (Copier) FirmwareCourier: stderr. return self._retrieve_or_fail(). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.681Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/credentials.py", line 1967, in _retrieve_or_fail. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.681Z [WARN] (Copier) FirmwareCourier: stderr. creds = fetcher(). {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.682Z [WARN] (Copier) FirmwareCourier: stderr. File "/home/ggc_user/.local/lib/python3.9/site-packages/botocore/credentials.py", line 1992, in fetch_creds. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.689Z [WARN] (Copier) FirmwareCourier: stderr. raise CredentialRetrievalError(. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.691Z [WARN] (Copier) FirmwareCourier: stderr. botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from container-role: Error retrieving metadata: Received non 200 response (500) from ECS metadata: Failed to get connection. {scriptName=services.FirmwareCourier.lifecycle.Run, serviceName=FirmwareCourier, currentState=RUNNING}
2023-04-12T08:52:11.947Z [INFO] (Copier) FirmwareCourier: Run script exited. {exitCode=1, serviceName=FirmwareCourier, currentState=RUNNING}
gefragt vor einem Jahr270 Aufrufe
2 Antworten
0
Akzeptierte Antwort

The reported error is a 500 error. Open the greengrass log file to see the problem (/greengrass/v2/logs/greengrass.log).

If you see occasional issues, then yes, you should retry using try-except.

AWS
EXPERTE
beantwortet vor einem Jahr
0

Thanks for the guidance! I found the following in GreengrassSystemComponent/<region>/System in Cloudwatch, which I believe is equivalent to greengrass.log.

2023-04-12T08:52:02.965Z [WARN] (pool-2-thread-28) com.aws.greengrass.tes.CredentialRequestHandler: Encountered error while fetching credentials. {iotCredentialsPath=/role-aliases/GreengrassV2TokenExchangeRoleAlias/credentials}
2023-04-12T08:52:02.972Z [ERROR] (pool-2-thread-28) com.aws.greengrass.tes.CredentialRequestHandler: Error in retrieving AwsCredentials from TES. {iotCredentialsPath=/role-aliases/GreengrassV2TokenExchangeRoleAlias/credentials, credentialData=Failed to get connection}

Is this expected behaviour if the device is unable to communicate with AWS services, e.g. because an internet connection has not yet been established?

beantwortet vor einem Jahr
  • Yes, that certainly makes sense. Credentials come from AWS, so we do need a connection to AWS in order to get them and then give it to your component.

  • Great, thank you!

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen