By using AWS re:Post, you agree to the AWS re:Post Terms of Use

Lambda vs Kinesis for uploading AWS IoT data to a database.

0

We have some IoT devices using AWS IoT Core that send data every 15 min to a lambda that deserializes and uploads to a Timestream database. I recently came across Kinesis Data Streams and it sounds like it has the capability to do what I'm using the lambdas for.

  1. Is Kinesis capable of doing the same job of deserializing and uploading to a database? If so, how would this differ from using the lambda? We don't need real time access to the data. It's accessed usually on a daily or weekly basis.
  2. I've noticed the lambda can only upload 100 records at a time due to the boto API restriction. Each IoT box generates records at 10Hz batched in 15min increments. Each lambda execution takes about 12 seconds which seems quite long. Would Kinesis be more efficient?
  3. Would any of this change if the database was changed to RDS instead of Timestream?
2 Answers
2

Kinesis Data Stream can't be used to store data into Timestream. It is a streaming service that is used to ingest large amounts of data. This data can then be consumed by different consumers, that usually you need to develop. Data can be persisted in a data stream for up to a year. You can traverse the stream and do operations on it, but I do not think this is the right approach.

There is one consumer that we provide which is called Data Firehose (maybe this is what you were referring to), which is used to save streaming data to different targets, such as S3, RedShift and many more. Timestream nor RDS are supported targets.

Depending on the type pf queries you need to perform on the data, maybe you should send it to S3, and then use Glue/Lambda to prepare the data and Athena to query it. Otherwise, you should probably continue using Lambda to ingest the data to the DB.

profile pictureAWS
EXPERT
answered 11 days ago
profile picture
EXPERT
reviewed 11 days ago
1
  1. While Kinesis Data Streams is capable of ingesting and storing data from IoT devices, it doesn't inherently have the ability to deserialize and upload data to a database like Timestream. Kinesis is primarily a data streaming service that can handle large volumes of data in real-time. To achieve the same functionality as your current Lambda setup, you would still need to use a Lambda function or another compute service to process the data from Kinesis and upload it to Timestream.

The main difference in using Kinesis would be in how the data is ingested and processed. Instead of IoT devices directly triggering Lambda functions, they would send data to Kinesis streams. You would then set up a Lambda function as a consumer of the Kinesis stream to process and upload the data to Timestream. This approach can provide better scalability and throughput, especially for high-volume data scenarios.

  1. Kinesis Data Streams could potentially be more efficient in handling your data volume. Kinesis can batch records and allow for parallel processing, which could improve overall throughput. With Kinesis, you can configure your Lambda function to process multiple records in a single invocation, potentially reducing the number of Lambda executions and associated costs.

To address the 100-record limit of the Timestream boto API, you could implement a batching mechanism within your Lambda function that processes the Kinesis records in chunks of 100 or less. This would allow you to handle larger batches of data from Kinesis while still respecting the Timestream API limitations.

Additionally, Kinesis offers features like enhanced fan-out, which can significantly reduce latency and increase throughput for high-volume data processing. This could potentially improve the efficiency of your data pipeline.

  1. If you were to switch from Timestream to RDS, the fundamental architecture wouldn't necessarily change, but there might be some differences in implementation:
  • Data Model: RDS is a relational database, so you'd need to design an appropriate schema for your time-series data, which might be more complex than with Timestream.

  • Write Performance: Timestream is optimized for time-series data and can handle high write throughput. RDS might require more careful consideration of write performance, especially for high-frequency data.

  • Querying: RDS would use SQL for querying, which might be more familiar but potentially less optimized for time-series data compared to Timestream's purpose-built query language.

  • Scalability: Timestream automatically scales to accommodate your data volume, while with RDS you'd need to manage scaling more actively.

  • Cost: The cost structure would be different, and you'd need to evaluate based on your specific usage patterns.

Regardless of whether you use Timestream or RDS, the basic flow of data from IoT devices through Kinesis (if you choose to use it) and then processed by Lambda for database insertion would remain similar. The main differences would be in how you structure your data and optimize your queries for the specific database system.
Sources
See GetRecords metrics in Kinesis enhanced fan-out mode | AWS re:Post
How Lambda processes records from Amazon Kinesis Data Streams - AWS Lambda
Process Amazon Kinesis Data Streams records with Lambda - AWS Lambda

profile picture
answered 11 days ago
profile picture
EXPERT
reviewed 11 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions