Greengrass component dependencies on active network

0

I am finding regularly when starting with GG 2.10.3 that some components end up in a 'BROKEN' state - One in particular is 'SystemsManagerAgent'. This is a mobile device, and sometimes the network is not available immediately when the system boots - I suspect that it may be failing due to the network not being up at the time when it's initialised. Also, in other cases, there may be an install step which attempts an 'apt-get', or pip install which would also be attempting to resolve from network. Is there any way to have a dependency for a component on the network being available - or better way to deal with this ?

ManagerAgent, currentState=STARTING}
2023-07-12T23:03:56.076Z [INFO] (Copier) aws.greengrass.SystemsManagerAgent: stdout. Applying config override from /etc/amazon/ssm/amazon-ssm-agent.json.. {scriptName=services.aws.greengrass.SystemsManagerAgent.lifecycle.Startup.Script, serviceName=aws.greengrass.SystemsManagerAgent, currentState=STARTING}
2023-07-12T23:03:56.076Z [INFO] (Copier) aws.greengrass.SystemsManagerAgent: stdout. 2023-07-12 23:03:56 ERROR Registration failed due to error registering the instance with AWS SSM. CredentialsEndpointError: failed to load credentials. {scriptName=services.aws.greengrass.SystemsManagerAgent.lifecycle.Startup.Script, serviceName=aws.greengrass.SystemsManagerAgent, currentState=STARTING}
2023-07-12T23:03:56.076Z [INFO] (Copier) aws.greengrass.SystemsManagerAgent: stdout. caused by: SerializationError: failed to decode error message. {scriptName=services.aws.greengrass.SystemsManagerAgent.lifecycle.Startup.Script, serviceName=aws.greengrass.SystemsManagerAgent, currentState=STARTING}
2023-07-12T23:03:56.076Z [INFO] (Copier) aws.greengrass.SystemsManagerAgent: stdout. status code: 500, request id:. {scriptName=services.aws.greengrass.SystemsManagerAgent.lifecycle.Startup.Script, serviceName=aws.greengrass.SystemsManagerAgent, currentState=STARTING}
2023-07-12T23:03:56.076Z [INFO] (Copier) aws.greengrass.SystemsManagerAgent: stdout. caused by: UnmarshalError: failed decoding error message. {scriptName=services.aws.greengrass.SystemsManagerAgent.lifecycle.Startup.Script, serviceName=aws.greengrass.SystemsManagerAgent, currentState=STARTING}
2023-07-12T23:03:56.076Z [INFO] (Copier) aws.greengrass.SystemsManagerAgent: stdout. 00000000  46 61 69 6c 65 64 20 74  6f 20 67 65 74 20 63 6f  |Failed to get co|. {scriptName=services.aws.greengrass.SystemsManagerAgent.lifecycle.Startup.Script, serviceName=aws.greengrass.SystemsManagerAgent, currentState=STARTING}
2023-07-12T23:03:56.076Z [INFO] (Copier) aws.greengrass.SystemsManagerAgent: stdout. 00000010  6e 6e 65 63 74 69 6f 6e                           |nnection|. {scriptName=services.aws.greengrass.SystemsManagerAgent.lifecycle.Startup.Script, serviceName=aws.greengrass.SystemsManagerAgent, currentState=STARTING}
2023-07-12T23:03:56.076Z [INFO] (Copier) aws.greengrass.SystemsManagerAgent: stdout. {scriptName=services.aws.greengrass.SystemsManagerAgent.lifecycle.Startup.Script, serviceName=aws.greengrass.SystemsManagerAgent, currentState=STARTING}
2023-07-12T23:03:56.076Z [INFO] (Copier) aws.greengrass.SystemsManagerAgent: stdout. caused by: invalid character 'F' looking for beginning of value. {scriptName=services.aws.greengrass.SystemsManagerAgent.lifecycle.Startup.Script, serviceName=aws.greengrass.SystemsManagerAgent, currentState=STARTING}
2023-07-12T23:03:56.081Z [INFO] (Copier) aws.greengrass.SystemsManagerAgent: Startup script exited. {exitCode=1, serviceName=aws.greengrass.SystemsManagerAgent, currentState=STARTING}
2023-07-12T23:03:56.101Z [INFO] (pool-2-thread-33) aws.greengrass.SystemsManagerAgent: shell-runner-start. {scriptName=services.aws.greengrass.SystemsManagerAgent.lifecycle.Shutdown.Script, serviceName=aws.greengrass.
majh
asked 10 months ago226 views
1 Answer
0

Hi,

Thanks for reaching out. Yes, lack of internet connectivity can result in components such as SystemsManagerAgent that depend on it becoming Broken due to repeatedly failure to start. As of today's date, we do not have a component that can detect internet availability and manage components respectively.

Given these issues occur frequently after startup, one possible workaround is to have an intermediate bash/terminal process that controls Greengrass startup. The process will use a time delay (after calculating the average time it takes for internet to be available and add a buffer) before it runs a system command to start the Greengrass process itself. This assumes that all components depend on the internet to operate.

If there are components on the core device that are critical and can operate without the use of the internet and so need these to be running as soon as possible, you can potentially build your own custom component using one of the supported SDKs [1]. The component will monitor network availability and once it identifies that internet is available, it can then call the RestartComponent greengrass-cli command to restart the component. You can read more about this at [2]. Please take note of the requirements towards using the cli command.

   

References

[1] Use the AWS IoT Device SDK to communicate with the Greengrass nucleus, other components, and AWS IoT Core - Supported SDKs for interprocess communication - https://docs.aws.amazon.com/greengrass/v2/developerguide/interprocess-communication.html#ipc-requirements

[2] Manage local deployments and components - https://docs.aws.amazon.com/greengrass/v2/developerguide/ipc-local-deployments-components.html

   

Honorable Mentions

PauseComponent, ResumeComponent - [] Interact with component lifecycle - https://docs.aws.amazon.com/greengrass/v2/developerguide/ipc-component-lifecycle.html

AWS
SUPPORT ENGINEER
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions