Combining AWS IoT Core with Step Functions to optimize IoT Data ingestion and consumption

9 minute read
Content level: Intermediate
4

Given its great scalability, a lot of IoT companies choose to use AWS IoT for their telemetry use cases. But as they grow, they are often worried about their initial (simple) data ingestion flow turning into a complex set of inefficient or highly intricate rules and destination services with time. The biggest threat being IoT infrastructure becoming either too expensive or difficult to operate.

In this post, we indicate how combining AWS IoT with Step Functions can help in that regard.

In their IoT journey, companies usually have to go through some similar milestones:

  1. simply growing their existing IoT fleet,
  2. on-boarding new device types / same types with higher capabilities,
  3. adapting to changing end-users requirements,
  4. implementing custom features for very specific internal or external needs (like a custom analytics pipeline when running a pilot with a specific group of customers).

At any of these stages, as time goes by and requirements keep stacking up, a very simple and well-architected initial service flow (expressed as AWS IoT Core ingestion → basic SQL statements / functions in Rule Engine → Destination service) can turn into a complex set of either inefficient or highly intricate rules and destinations. The whole IoT infrastructure thus become either too expensive or difficult to operate without risk for the production environment.

In this post, we indicate a possible approach that can help you mitigate the risks mentioned above. This mainly consists of decoupling Data ingestion + filtering (relying mainly on AWS IoT Core and Rule Engine) and downstream services orchestration (leveraging AWS Step Functions).

For the sake of clarity, we’ll focus on a specific use case all along.

1. Use Case Description and Initial Scope

1.1. Description

AwesomeIOT LLC is a Smart Home IoT Solutions provider, with the following portfolio:

  • Digital Locks
  • Smart Thermostat
  • Smart Lockers for deliveries

As each of these device types are supported by a separate business unit internally, AwesomeIOT LLC intends to have a distinct real-time Dashboard monitoring and alert system for the technical support teams in each business unit.

1.2.Initial scope Solution

One direct way to meet these initial business requirements would be to implement AWS IoT Core Ingestion with 2 set of rules:

  • 1 set that collects all data and sends to a Time series database. Then use Managed Service for Grafana to create a custom Dashboard that queries your Timestream database an display data as needed.
  • 1 set that filters incoming data according to threshold values or alarm state, then send them to SNS topic for processing and notification to the relevant support team.

Enter image description here

2. Implementing customer service levels

Now, let’s imagine AnyCompany becomes very successful, with an exponential customer base increase. In order to serve that large number efficiently, the company decides to implement customer service levels (Standard and VIP). VIP customers will now need to be excluded from main customer support channels, as they will have a separate executive support and account management team to handle their issues timely.

2.1. Approach 1 - Stacking up rules on Rule Engine

Here are the immediate changes we can anticipate on the initial architecture to meet the new requirements:

  • The previous set of IoT Rules for support notification must be changed (to exclude VIP users)
  • A new set of rules must be created to select only the VIP users from incoming traffic for custom notification flow (through new SNS topics)
  • A customer profile data store (like AWS Dynamo DB) containing customers profiles and edge devices Ids mapping must now be included as well, when defining/updating rules.

Enter image description here

While this immediate solution might look simple and powerful (“just adding some new rules in Rule Engine and updating a few existing ones” on the original architecture), there might be several “hidden” drawbacks as well:

  • Potential damage to Operational excellence
    • High risks on production environment, as multiple changes must be performed on live rules
    • Limited ability to ‘test’ the new intended business logic before implementation
    • Cost of troubleshooting problems with the new workflow after desired updates are implemented
    • Increasing difficulty to keep good visibility on the entire business logic, as successive independent rules changes/addition may be implemented by different IT operation resources
    • Potential maintenance headaches on the long run, given the inevitable IT operation teams’ turnovers
  • Potential performance inefficiencies, as new rules are introduced (and executed) independently from one another. For example, independent rules implementation here would imply repetitive calls to dynamoDB to get customer profile for each rule.
  • Unnecessary Costs increases, as inefficient use of rules may lead to much higher costs than normally commanded by business requirements.

2.2. Approach 2 - Rule Engine + Step Functions

Instead of managing both incoming data filtering and business logic (external services orchestration) with Rules as above, let’s consider using AWS IoT Rule Engine mainly as a preprocessing layer (data filtering, decoding, etc.) and AWS Step Functions for business logic execution.

In this approach, Rule Engine ensures that any incoming data from the edge is converted to JSON format (so that Step functions can consume them directly), and preprocessed (if needed) by a rule engine function before it is passed to orchestration layer downstream. All the main orchestration needed afterwards between Customer Data store (Dynamo DB), Notification Service (SNS), Time series storage and real-time streaming (Amazon Timestream + Managed Service for Grafana) is delegated to Step functions.

In our specific example, we can define a Step function (workflow) for each of the device types; this choice appears convenient because we assume all devices of the same type have similar payloads. So, we’ll have:

  • 1 Step function for Smart locks
  • 1 Step function for Thermostats
  • 1 Step function for Smart Lockers

Enter image description here

In this architecture, each step function will first execute all the steps common to both Standard and VIP customers, then query the customers DB once to determine the device owner service level, and proceed with the differentiated set of steps only (with respect to the customer profile).

Smart Lock Step Function

Enter image description here

Using Workflow Studio for AWS Step Functions and leveraging AWS Step Functions direct integration with more than 200 other services (including DynamoDB, Timestream and SNS), you can implement this flowchart in a drag-and-drop mode directly on AWS Console, with very little extra coding required.

Smart Thermostat & Smart Locker Step Functions

The other 2 step functions in our case will look very similar, with 2 nuances:

  • The ‘Is Device alarmed’ check will be stated differently for each device type (Smart Locker may have an alarm flag within the payload data, while the Smart Thermostat may require comparing the reported Temperature to some predefined threshold values)
  • The SNS topics considered will match with the type of devices considered.

2.3. Approach 1 (Rule Engine) vs Approach 2 (Rule Engine + Step Functions)

Rule Engine onlyRule Engines & Step Functions
SecurityIoT Infrastructure maintenance and Business Logic maintenance access are tightly coupled, as business logic is defined mainly as a combination of rules in Rules EnginePossibility to separate access to IT Infrastructure (AWS IoT mainly) from access to Business Logic (AWS Step Functions mainly)
SecurityAccess to external services granted to AWS IoT ServiceAccess to external services granted to AWS Step Functions mainly, thus possibility to align the access permissions with the various business logic (workflows) created.
ReliabilityDealing with errors relies mainly on 'Error Action' in Rule Engine, which typically catches error data and defer further handling to an external service for further analysisError handling in Step functions natively integrate powerful features like 'Retry' with Exponential Back-off Algorithm and Max retry times, Catching different types of errors and defining a custom workflow based on the returned error, etc.
Performance EfficiencyMeasuring the application performance relies either on several aggregated CloudWatch metrics or on custom-built analytics to have a more holistic pictureWith service orchestration being centralized in Step Functions, you can immediately leverage the integrated service insights to monitor the whole business logic and immediately identify the eventual bottlenecks for future optimizations
Cost OptimizationStacking up rules independently in Rule Engine might drive the costs very high, especially because the risk of inefficiencies is only getting higherWhile this option might be more expensive for low incoming data volumes, Organizing the business logic in a flowchart manner can help pinpoint the potential inefficiencies easier and ensure an optimal use of cloud services (thus generate significant cost savings in the long run or for higher volumes)
Operational ExcellenceIndividual changes are usually easy to implement (adding/modifying independent rules) but overall Business Logic maintenance only becomes more challenging; normal business operations can be compromised by a single wrong manipulation by IT Operations (deleting the wrong rule, keeping an old one when it is not needed anymore, etc).Changes to business logic can simply be implemented by creating an updated step function that represents the new business logic and a new rule that will route edge traffic to it (while the old ones are simply deactivated). This facilitates CI/CD approach to business logic changes implementation.

3. Conclusion

AWS Step functions are used a lot in Compute use cases to enable orchestration of serverless applications. The introduction of Express Flow feature of Step functions allowed the expansion of Step Functions use cases to high velocity use cases like IoT Telemetry data ingestion and Consumption. While Rule Engine feature of AWS IoT Core has a proven track in terms of scalability, its combination with Step Functions can greatly enhance IoT Applications Operations, by decoupling data ingestion from data consumption and allowing better visibility on data consumption logic.

profile pictureAWS
EXPERT
published 4 months ago2021 views