IDT fails waiting for GG deployment but only for ML tests

0

I am trying to run the latest IDT on an Onyx ACCEL-JS500 (Jetson AGX Xavier, aarm64). It all runs fine except for the ML tests, which wait 5 minutes for Greengrass deployments and time out. The other test cases seem to have 2 minute deployment timeouts and succeed with no problem.

I realized that the box was already running GG as a Core device, but that didn't seem to interfere with the other tests. Nevertheless, I ran systemctl stop greengrass and re-ran just the ML tests, which failed exactly as before.

From what I can interpret in the logs, the dependent component DLR did not start.

Is this component known to work on aarm64?

preguntada hace 2 años248 visualizaciones
3 Respuestas
0
Respuesta aceptada

I was able to re-run the tests with a timeout scale factor, and found the culprit:

2022-03-11T17:15:28.176Z [INFO] (Copier) variant.DLR: stdout. Building wheels for collected packages: awscrt. {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:15:28.177Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: started. {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:16:29.613Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: still running.... {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:17:29.768Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: still running.... {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:18:29.993Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: still running.... {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:19:29.996Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: still running.... {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:20:29.996Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: still running.... {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:20:49.478Z [INFO] (Copier) variant.DLR: stdout. Running setup.py bdist_wheel for awscrt: finished with status 'done'. {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}
2022-03-11T17:20:49.480Z [INFO] (Copier) variant.DLR: stdout. Stored in directory: /root/.cache/pip/wheels/ff/eb/60/564d1fad91e76c11a69261314886f932435e01836237b6d97d. {scriptName=services.variant.DLR.lifecycle.install.script, serviceName=variant.DLR, currentState=NEW}

It took just over 5 minutes (the default timeout) just to "build wheels" (I thought we weren't suppose to reinvent the wheel :-).

So the answer is: add --timeout-multiplier 2 (or whatever value you need) if you find a test times out on your platform.

respondido hace 2 años
0

The ML tests bring in Python dependencies not required by the other test groups, which adds to their length. A longer run time is not unexpected, but I'd like to confirm. What version of IDT are you using and are you able to send your IDT config files? (config.json, device.json, userdata.json)

Thank you,

Matthew (AWS)

respondido hace 2 años
  • Thanks for the response. I just re-ran with a timeout scale factor of 10, and found that the key step (wheel building) took just over 5 minutes.

  • Matthew, I see some disconcerting things in the log file aws.greengrass.DLRImageClassification.log (added to the gist linked in the question).

    After apparent success publishing the classifications, the script is terminated by SIGTERM (exitCode=143), started again and SIGTERM'd again 129ms later.

    Is this expected?

  • To answer your questions: IDT 4.5.1; config files added to gist linked in the OQ.

0

This is expected. IDT reaches timeout and terminates the DLR component, which the GreenGrass runtime catches and restarts (now STOPPING). The log entry produced 129ms after the initial SIGTERM is a second level of logging that refers to the same event.

respondido hace 2 años

No has iniciado sesión. Iniciar sesión para publicar una respuesta.

Una buena respuesta responde claramente a la pregunta, proporciona comentarios constructivos y fomenta el crecimiento profesional en la persona que hace la pregunta.

Pautas para responder preguntas