Iot Core Jobs `maximumPerMinute` rate limit: How does this work?

0

I use AWS jobs to deploy firmware updates to my remote devices. To avoid overloading other web services, the update jobs are intended to be spaced out using the maximumPerMinute rate limit, however the limit doesn't appear to be applied when I add additional devices to the group attached to the Job.

Scenario

  • Create a new static thing group Version_xxx (AKA the Job group). Things will be moved to this group when I am ready for them to receive the update job
  • Create a new continuous Job which will be used to deploy the update (i.e. details contain download locations etc.). The only target is the Job Group
  • Add a set of things to the job group using the AWS CLI (I have a script to do this so could be many devices per second)

I would expect to see only a limited number of job executions entering the "In progress" state because of the rate limiting however I'm getting every single execution in progress immediately. What am I missing here?

E:

  • For clarity, devices can't be added directly to the group because I have found that doesn't allow re-using the same job at a later date (e.g. retry after update failure, version 1 -> 2 -> 1, ...)
  • Adding things to a group and adding that group to the job does exhibit rate limiting behaviour (things are queued at a desired rate), however this is a PITA to manage (can't delete the group until all executions are completed, job has a limit to the number of groups that can be added as targets, ...) so is less than ideal
JC
asked 14 days ago84 views
1 Answer
0

Hi. Please refer to here: https://docs.aws.amazon.com/iot/latest/developerguide/jobs-configurations-details.html#job-rollout-abort-scheduling

The rollout configuration can control the rollout rates only for devices that are added to the group until job creation. After a job has been created, for any new devices, the job executions are created in near real time as soon as the devices join the target group.

Although this statement appears in the context of continuous jobs that target dynamic groups, it's also true of continuous jobs that target static groups.

For clarity, devices can't be added directly to the group because I have found that doesn't allow re-using the same job at a later date (e.g. retry after update failure, version 1 -> 2 -> 1, ...)

I don't quite understand what you mean by "re-use" here in the case of a continuous job. Are you using the job execution retry feature?

Adding things to a group and adding that group to the job does exhibit rate limiting behaviour (things are queued at a desired rate)

This is the typical workflow. Typically the job would target a group that already includes the bulk of the devices, with a trickle of devices added to the group subsequently. For example, create a job to deploy version 1.1 to the existing fleet of 1.0 devices, with a trickle of 1.0 devices being added as they're unboxed after the job was created.

however this is a PITA to manage (can't delete the group until all executions are completed, job has a limit to the number of groups that can be added as targets, ...) so is less than ideal

This is one way fleet indexing and dynamic groups can help.

This relatively old blog covers how to manage continuous jobs with static thing groups. It was written before dynamic groups existed. I recommend dynamic groups as an easier approach however.

https://aws.amazon.com/blogs/iot/using-continuous-jobs-with-aws-iot-device-management/

profile pictureAWS
EXPERT
Greg_B
answered 13 days ago
  • Typically the job would target a group that already includes the bulk of the devices... - In my case this is the exact opposite of typical. Deployments for me are a carefully staged process. Update jobs are first deployed to a small inhouse verification group with simulated product. Then to a pre-selected set that is reasonably accessible but is actual product. Verification on product requires a minimum of 3 days to hit the majority of core features and they happen to be difficult to simulate accurately (pause required) Only after this is it desirable to add large quantities of devices

  • It's not wrong to do that, but certainly you will face this issue. One alternative is to have different thing groups, and different jobs, for each of those deployment waves. This is also a common approach. So a QA bench test group and job, a beta/canary group and job, and so forth.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions