What is the best practice for API call retries when using the Token Exchange Service before IAM credentials are cached

0

We have quite a few custom components in Greengrass, they all depend on the Token Exchange Service Component to interact with other AWS services. I am using the Javascript AWS SDK V3 for the AWS services. When the component first starts the API call always fails due to:

CredentialsProviderError: Could not load credentials from any providers or ProviderError TimeoutError from instance metadata service

I have been completing some tests to try and understand the behaviour:

this is my recipe:

RecipeFormatVersion: "2020-01-25"
ComponentName: "{COMPONENT_NAME}"
ComponentVersion: "{COMPONENT_VERSION}"
ComponentDescription: "Test"
ComponentPublisher: "{COMPONENT_AUTHOR}"
ComponentDependencies:
  aws.greengrass.TokenExchangeService:
    VersionRequirement: ">=2.0.0 <2.1.0"
    DependencyType: "HARD"
Manifests:
  - Platform:
      os: all
    Artifacts:
      - URI: "s3://BUCKET_NAME/COMPONENT_NAME/COMPONENT_VERSION/component.zip"
        Unarchive: ZIP
    Lifecycle:
      Install: 
        Script: cd {artifacts:decompressedPath}/component && npm install
        RequiresPrivilege: true
      Run: 
        Script: "node {artifacts:decompressedPath}/component/index.js"
        RequiresPrivilege: true

this is my index.js

const {
    fromContainerMetadata,
} = require("@aws-sdk/credential-providers");


const { PublishCommand, IoTDataPlaneClient } = require("@aws-sdk/client-iot-data-plane");

credentials = fromContainerMetadata({
})

iotClient = new IoTDataPlaneClient({
    region: "ap-southeast-2",
    credentials: credentials,
    maxRetries: 5,
})



const input = {
    topic: "test",
    payload: JSON.stringify({ test: "test" }),
};


start = async () => {
    try {


        const command = new PublishCommand(input);
        result = await iotClient.send(command)
        console.log("Success")
    } catch (err) {
        console.log(`Failed: ${err.name} ${err.message}`)
        start()
    }
}
start()

I deploy the component and restart the greengrass service so the TES is restarted also.

my componet log shows this:

2023-08-09T04:33:59.650Z [INFO] (pool-2-thread-13) test: shell-runner-start. {scriptName=services.test.lifecycle.Run.Script, serviceName=test, currentState=STARTING, command=["node /greengrass/v2/packages/artifacts-unarchived/test/1.0.17/component/index...."]}
2023-08-09T04:34:01.754Z [INFO] (Copier) test: stdout. Failed: ProviderError TimeoutError from instance metadata service. {scriptName=services.test.lifecycle.Run.Script, serviceName=test, currentState=RUNNING}
2023-08-09T04:34:02.784Z [INFO] (Copier) test: stdout. Failed: ProviderError TimeoutError from instance metadata service. {scriptName=services.test.lifecycle.Run.Script, serviceName=test, currentState=RUNNING}
2023-08-09T04:34:03.795Z [INFO] (Copier) test: stdout. Failed: ProviderError TimeoutError from instance metadata service. {scriptName=services.test.lifecycle.Run.Script, serviceName=test, currentState=RUNNING}
2023-08-09T04:34:04.805Z [INFO] (Copier) test: stdout. Failed: ProviderError TimeoutError from instance metadata service. {scriptName=services.test.lifecycle.Run.Script, serviceName=test, currentState=RUNNING}
2023-08-09T04:34:05.821Z [INFO] (Copier) test: stdout. Failed: ProviderError TimeoutError from instance metadata service. {scriptName=services.test.lifecycle.Run.Script, serviceName=test, currentState=RUNNING}
2023-08-09T04:34:06.835Z [INFO] (Copier) test: stdout. Failed: ProviderError TimeoutError from instance metadata service. {scriptName=services.test.lifecycle.Run.Script, serviceName=test, currentState=RUNNING}
2023-08-09T04:34:07.848Z [INFO] (Copier) test: stdout. Failed: ProviderError TimeoutError from instance metadata service. {scriptName=services.test.lifecycle.Run.Script, serviceName=test, currentState=RUNNING}
2023-08-09T04:34:13.226Z [INFO] (Copier) test: stdout. Success. {scriptName=services.test.lifecycle.Run.Script, serviceName=test, currentState=RUNNING}
2023-08-09T04:34:13.258Z [INFO] (Copier) test: Run script exited. {exitCode=0, serviceName=test, currentState=RUNNING}

greengrass.log shows this:

2023-08-09T04:33:51.531Z [INFO] (pool-2-thread-17) com.aws.greengrass.tes.TokenExchangeService: Started server at port 32901. {serviceName=aws.greengrass.TokenExchangeService, currentState=STARTING}
2023-08-09T04:33:51.532Z [INFO] (aws.greengrass.TokenExchangeService-lifecycle) com.aws.greengrass.tes.TokenExchangeService: service-set-state. {serviceName=aws.greengrass.TokenExchangeService, currentState=STARTING, newState=RUNNING}
2023-08-09T04:33:51.534Z [INFO] (onwatch.mqttbroker-plugin-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=onwatch.mqttbroker-plugin, currentState=INSTALLED, newState=STARTING}
2023-08-09T04:33:51.583Z [INFO] (onwatch.mqttbroker-plugin-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=onwatch.mqttbroker-plugin, currentState=STARTING, newState=RUNNING}
2023-08-09T04:33:55.988Z [INFO] (AwsEventLoop 4) com.aws.greengrass.mqttclient.AwsIotMqtt5Client: Successfully connected to AWS IoT Core. {clientId=Dev, sessionPresent=false}
2023-08-09T04:33:56.301Z [INFO] (AwsEventLoop 4) com.aws.greengrass.deployment.IotJobsHelper: No deployment job found. {ThingName=Dev}
2023-08-09T04:33:56.472Z [INFO] (AwsEventLoop 4) com.aws.greengrass.deployment.ShadowDeploymentListener: Deployment result already reported. Ignoring shadow update at startup. {CONFIGURATION_ARN=arn:aws:greengrass:ap-southeast-2:949179323480:configuration:thing/Dev:34}
2023-08-09T04:33:59.642Z [INFO] (test-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=test, currentState=NEW, newState=INSTALLED}
2023-08-09T04:33:59.645Z [INFO] (test-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=test, currentState=INSTALLED, newState=STARTING}
2023-08-09T04:33:59.655Z [INFO] (test-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=test, currentState=STARTING, newState=RUNNING}
2023-08-09T04:33:59.659Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=INSTALLED, newState=STARTING}
2023-08-09T04:33:59.662Z [INFO] (pool-2-thread-16) com.aws.greengrass.lifecyclemanager.GenericExternalService: generic-service-finished. Nothing done. {serviceName=main, currentState=STARTING}
2023-08-09T04:33:59.664Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=STARTING, newState=FINISHED}
2023-08-09T04:33:59.666Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=FINISHED, newState=STOPPING}
2023-08-09T04:33:59.668Z [INFO] (pool-2-thread-16) com.aws.greengrass.lifecyclemanager.GenericExternalService: Shutdown initiated. {serviceName=main, currentState=STOPPING}
2023-08-09T04:33:59.669Z [INFO] (pool-2-thread-16) com.aws.greengrass.lifecyclemanager.GenericExternalService: generic-service-shutdown. {serviceName=main, currentState=STOPPING}
2023-08-09T04:33:59.671Z [INFO] (main-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=main, currentState=STOPPING, newState=FINISHED}
2023-08-09T04:34:07.889Z [INFO] (pool-2-thread-16) com.aws.greengrass.tes.CredentialRequestHandler: Received IAM credentials that will be cached until 2023-08-09T05:29:07Z. {iotCredentialsPath=/role-aliases/OnWatch-Ump-TokenExchangeRoleAlias/credentials}
2023-08-09T04:34:13.257Z [INFO] (Copier) com.aws.greengrass.lifecyclemanager.GenericExternalService: Run script exited. {exitCode=0, serviceName=test, currentState=RUNNING}
2023-08-09T04:34:13.259Z [INFO] (Copier) com.aws.greengrass.lifecyclemanager.GenericExternalService: generic-service-stopping. Service finished running. {serviceName=test, currentState=RUNNING}
2023-08-09T04:34:13.261Z [INFO] (test-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=test, currentState=RUNNING, newState=STOPPING}
2023-08-09T04:34:13.263Z [INFO] (pool-2-thread-14) com.aws.greengrass.lifecyclemanager.GenericExternalService: Shutdown initiated. {serviceName=test, currentState=STOPPING}
2023-08-09T04:34:13.265Z [INFO] (pool-2-thread-14) com.aws.greengrass.lifecyclemanager.GenericExternalService: generic-service-shutdown. {serviceName=test, currentState=STOPPING}
2023-08-09T04:34:13.266Z [INFO] (test-lifecycle) com.aws.greengrass.lifecyclemanager.GenericExternalService: service-set-state. {serviceName=test, currentState=STOPPING, newState=FINISHED}

I have noticed I can extend the timeout and maxretries,

credentials = fromContainerMetadata({
    timeout: 200,
    maxRetries: 2
})

If make these high enough then it will succeed, the issue is that if the device is not connected to the internet at the time then the same error will be thrown indefinitely . So I am trying to find a way to then implement a custom retry mechanism if it does fail the program can continue and then the api call can be made at a later time.

What is the best practice using this SDK for a retry mechanism? I haven't been able to find any clear documentation on this. Currently I just catch the particular error implement a timeout and then retry, is there a preferred way or something that is built into the SDK?

asked 9 months ago256 views
2 Answers
0

To publish messages to AWS IoT Core we recommend using the Greengrass SDK and not boto3: it allows to leverage the Greengrass Nucleus spooler that can cache messages while the device is disconnected and send them as soon as connectivity is re-established.

https://aws.github.io/aws-iot-device-sdk-js-v2/node/modules/greengrasscoreipc.html#Client

https://aws.github.io/aws-iot-device-sdk-js-v2/node/classes/greengrasscoreipc.Client-1.html#publishToIoTCore

As per your question about retries, what you are doing is correct (check also this answer), but a better implementation would have different retries policies for different errors. More over it should also implement an exponential backoff strategy. You can use libraries such as https://www.npmjs.com/package/exponential-backoff

Few additional comments of the recipe you are using:

Lifecycle:
      Install: 
        Script: cd {artifacts:decompressedPath}/component && npm install
        RequiresPrivilege: true
      Run: 
        Script: "node {artifacts:decompressedPath}/component/index.js"
        RequiresPrivilege: true

Do not use RequiresPrivilege since it means the script is running with root permissions with all the relative security implications. The reason you need to use RequiresPrivilege is because you are trying to modify the package folders. These folders are owned and managed by Greengrass and your components should not modify them. Instead, just copy package.json to the work folder of the component and run npm install from there. The recipe should look like:

Lifecycle:
      Install: 
        Script: cp {artifacts:decompressedPath}/component/package.json . && npm init -y && npm install
      Run: 
        Script: "node {artifacts:decompressedPath}/component/index.js"
AWS
EXPERT
answered 9 months ago
0

Hi Massimiliano,

Thanks for the information, I will look at the greengrass sdk. Do I need to add any component dependencies when using the greengrassipc like disk spooler? Are there any examples of implementing this?

for the recipe example you provided I am still getting a permission error. I also changed your recipe slightly so copy the entire package to the work directory and then ran npm install as it needs the index.js in the same directory:

---
RecipeFormatVersion: "2020-01-25"
ComponentName: "{COMPONENT_NAME}"
ComponentVersion: "{COMPONENT_VERSION}"
ComponentDescription: "Test"
ComponentPublisher: "{COMPONENT_AUTHOR}"
ComponentDependencies:
  aws.greengrass.TokenExchangeService:
    VersionRequirement: ">=2.0.0 <2.1.0"
    DependencyType: "HARD"
Manifests:
  - Platform:
      os: all
    Artifacts:
      - URI: "s3://BUCKET_NAME/COMPONENT_NAME/COMPONENT_VERSION/component.zip"
        Unarchive: ZIP
    Lifecycle:
      Install: 
        Script: whoami && echo ${PWD} && cp {artifacts:decompressedPath}/component/* . && npm init -y && npm install
      Run: 
        Script: "node index.js"

I also added in whoami and got the working directory to understand what user is running the command and in what folder:

whoami - devuser

${PWD} - /greengrass/v2/work/test

devuser in the default user I have created to run the components.

permissions on the work folder are:

drwxr-xr-x 17 root root 4096 Aug 10 00:15 work

permissions on the 'test' (component folder) in the work directory are:

drwx------ 2 devuser devuser 4096 Aug 10 00:28 test

permissions on the node package files in the test folder are:

-r--r----- 1 devuser devuser   882 Aug 10 00:28 index.js
-r--r----- 1 devuser devuser   406 Aug 10 00:28 package.json
-r--r----- 1 devuser devuser 88267 Aug 10 00:28 package-lock.json

I then get the following errors still... to me the permissions look suitable?

2023-08-09T23:28:16.077Z [INFO] (Copier) test: stdout. devuser. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:16.078Z [INFO] (Copier) test: stdout. /greengrass/v2/work/test. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.453Z [WARN] (Copier) test: stderr. npm ERR! code EACCES. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.454Z [WARN] (Copier) test: stderr. npm ERR! syscall open. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.455Z [WARN] (Copier) test: stderr. npm ERR! path /greengrass/v2/work/test/package.json. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.457Z [WARN] (Copier) test: stderr. npm ERR! errno -13. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.463Z [WARN] (Copier) test: stderr. npm ERR! Error: EACCES: permission denied, open '/greengrass/v2/work/test/package.json'. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.464Z [WARN] (Copier) test: stderr. npm ERR!  [Error: EACCES: permission denied, open '/greengrass/v2/work/test/package.json'] {. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.464Z [WARN] (Copier) test: stderr. npm ERR!   errno: -13,. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.464Z [WARN] (Copier) test: stderr. npm ERR!   code: 'EACCES',. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.465Z [WARN] (Copier) test: stderr. npm ERR!   syscall: 'open',. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.465Z [WARN] (Copier) test: stderr. npm ERR!   path: '/greengrass/v2/work/test/package.json'. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.466Z [WARN] (Copier) test: stderr. npm ERR! }. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.466Z [WARN] (Copier) test: stderr. npm ERR!. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.466Z [WARN] (Copier) test: stderr. npm ERR! The operation was rejected by your operating system.. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.467Z [WARN] (Copier) test: stderr. npm ERR! It is likely you do not have the permissions to access this file as the current user. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.467Z [WARN] (Copier) test: stderr. npm ERR!. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.467Z [WARN] (Copier) test: stderr. npm ERR! If you believe this might be a permissions issue, please double-check the. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.468Z [WARN] (Copier) test: stderr. npm ERR! permissions of the file and its containing directories, or try running. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.468Z [WARN] (Copier) test: stderr. npm ERR! the command again as root/Administrator.. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.471Z [WARN] (Copier) test: stderr. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.472Z [WARN] (Copier) test: stderr. npm ERR! A complete log of this run can be found in:. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.472Z [WARN] (Copier) test: stderr. npm ERR!     /home/devuser/.npm/_logs/2023-08-09T23_28_17_019Z-debug-0.log. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW}
2023-08-09T23:28:17.488Z [WARN] (pool-2-thread-159) test: shell-runner-error. {scriptName=services.test.lifecycle.Install.Script, serviceName=test, currentState=NEW, command=["whoami && echo ${PWD} && cp /greengrass/v2/packages/artifacts-unarchived/test/..."]}

If I manually change the permissions with write permission the node related files in work/test/ then redeploy the component it works successfully.

-rw-r-----  1 devuser devuser   882 Aug 10 00:37 index.js
-rw-r-----  1 devuser devuser   424 Aug 10 00:37 package.json
-rw-r-----  1 devuser devuser 88267 Aug 10 00:37 package-lock.json

What steps am I missing to do this correctly?

answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions