IDT fails waiting for GG deployment but only for ML tests

0

I am trying to run the latest IDT on an Onyx ACCEL-JS500 (Jetson AGX Xavier, aarm64). It all runs fine except for the ML tests, which wait 5 minutes for Greengrass deployments and time out. The other test cases seem to have 2 minute deployment timeouts and succeed with no problem.

I realized that the box was already running GG as a Core device, but that didn't seem to interfere with the other tests. Nevertheless, I ran systemctl stop greengrass and re-ran just the ML tests, which failed exactly as before.

From what I can interpret in the logs, the dependent component DLR did not start.

Is this component known to work on aarm64?

asked 2 years ago241 views
3 Answers
0
Accepted Answer

I was able to re-run the tests with a timeout scale factor, and found the culprit:

2022-03-11T17:15:28.176Z [INFO] (Copier) variant.DLR: stdout. Building wheels for collected packages: awscrt. {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:15:28.177Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: started. {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:16:29.613Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: still running.... {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:17:29.768Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: still running.... {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:18:29.993Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: still running.... {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:19:29.996Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: still running.... {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:20:29.996Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: still running.... {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:20:49.478Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: finished with status 'done'. {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:20:49.480Z [INFO] (Copier) variant.DLR: stdout. Stored in directory: /root/.cache/pip/wheels/ff/eb/60/564d1fad91e76c11a69261314886f932435e01836237b6d97d. {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}

It took just over 5 minutes (the default timeout) just to "build wheels" (I thought we weren't suppose to reinvent the wheel :-).

So the answer is: add --timeout-multiplier 2 (or whatever value you need) if you find a test times out on your platform.

answered 2 years ago
0

The ML tests bring in Python dependencies not required by the other test groups, which adds to their length. A longer run time is not unexpected, but I'd like to confirm. What version of IDT are you using and are you able to send your IDT config files? (config.json, device.json, userdata.json)

Thank you,

Matthew (AWS)

answered 2 years ago
  • Thanks for the response. I just re-ran with a timeout scale factor of 10, and found that the key step (wheel building) took just over 5 minutes.

  • Matthew, I see some disconcerting things in the log file aws.greengrass.DLRImageClassification.log (added to the gist linked in the question).

    After apparent success publishing the classifications, the script is terminated by SIGTERM (exitCode=143), started again and SIGTERM'd again 129ms later.

    Is this expected?

  • To answer your questions: IDT 4.5.1; config files added to gist linked in the OQ.

0

This is expected. IDT reaches timeout and terminates the DLR component, which the GreenGrass runtime catches and restarts (now STOPPING). The log entry produced 129ms after the initial SIGTERM is a second level of logging that refers to the same event.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions