Seeking advice on improving performance with large IoT data on AWS

0

I would appreciate some advice on a performance issue we are experiencing with our IoT data. Here's a brief overview of our situation:

Our IoT devices send data every 3 seconds, which is processed by a server and saved into a database. The front-end application retrieves data from the database by calling the backend. We are experiencing performance issues due to the large amount of data we have on the database and also the data we are receiving, and we want to avoid continually increasing server and database specs. Here's some more information about our current structure and potential solutions we have considered:

Our Structure:

  • Data sent by devices has about 60 values (mix of numbers and strings)
  • IoT Core sends device data by a rule to an Elastic Beanstalk environment (PHP Laravel) that processes the data and saves it to an RDS database (Postgres)
  • Data is saved in one table with added indexes for improved performance
  • Users access data through an Amplify application (Angular) that calls the backend, which retrieves data from the database Enter image description here

TimescaleDB: We have benchmarked TimescaleDB but have found that it does not solve our issue of having to continually increase server and database specs to handle the large amount of data.

Potential Solutions:

  • Storing data in small files (CSV or JSON) on Amazon S3: The server will receive data and, depending on the date, save it to a specific file every hour. The backend will generate temporary links for users to access the data. This solution will take load off the server and database. Enter image description here

  • Like the previous solution but instead of the server we use a Lambda function to handle data from IoT devices and saving it directly to S3 which will reduce the load on the server. Enter image description here

  • Same as the last solution but using Kinesis instead of Lambda to handle IoT data, allowing us to retain data for a short period of time for quick access and then saving it to S3. This solution also allows for potential analytics on the data in the future. Enter image description here

We are looking for advice on these potential solutions, as well as any other suggestions or feedback that you may have. We want to improve performance without increasing costs, so any input you can offer would be greatly appreciated.

Thank you in advance for your help!

Best regards.

3 Answers
0

Not an expert on data, but a potential solution might be to send data from IoT to Kinesis Fire Hose to S3 (using some columnar format such as Parquet) and then use Athena for querying the date from S3 when you need it.

Depending on your queries, you may also want to have some ETL process that runs periodically and aggregates the raw data so that the queries are much faster.

profile pictureAWS
EXPERT
Uri
answered a year ago
  • Thank you for your answer, this gives me another possibility to do stuff and i will definetly look into it.

0

I recommend you to have a look at this blog post about the IoT data ingestion architectural patterns in AWS.

You should consider using Amazon Timestream (See pattern number 4 in the blog). You can push data from IOT core directly to Timestream to have a highly scalable IOT ingestion pipeline.

For more information: Timestream with IOT core integration documentation guide and Ingesting Data into Amazon Timestream with AWS IoT Core video.

profile pictureAWS
answered a year ago
  • Thank you for your advice, Timestream looks like a good solution but i think it's gonna make the costs go up, no ? our goal is a solution that could reduce costs.

0

I've reviewed your current structure and potential solutions, and I have a few recommendations.

First, I think you should consider using a different database for your IoT data. Postgres is a great database for general-purpose use, but it's not the best choice for storing large amounts of time-series data. A better option would be a database that's specifically designed for IoT, such as InfluxDB or Amazon Timestream.

Second, you could use a different data storage solution. Storing your data in small files on S3 is a good option, but it's not the only one. You could also use a database like DynamoDB or Redshift.

Finally, you could use a different way to process your data. Instead of using a server, you could use a Lambda function or Kinesis. This would reduce the load on your server and improve performance.

answered a year ago
  • Thank you for your answer, for storing data i'm thinking about files in s3 to make the costs more low and kinesis or lambda to handle that data is a good solution.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions