[S3] Kinesis, File Gateway, or direct S3 writing?

0

Hi,

I have a customer who wants to write solar power generator's sensor data to S3. The data stream happens usually during the day time and almost no data on night time. It will likely be about 1MB / second during the day time. It may vary to 5MB or more depending on how many solar panels in the deployed generator area.

There may be network off time to time since solar power generators are usually place on mountain area.

They want to save the sensor data to S3 since it's all read only data there. They will use SageMaker as well for complex Machine Learning process. The ML process + weather information will eventually make prediction about how much power will be generated for the next month after power generation commitment is made.

There is no control data going back to edge side, so I filtered out IoT Core from the data ingestion consideration. There were similar previous project in Korea using IoT Core, but had trouble from streaming data to the cloud and found rather Kinesis was better approach. However, in the later stage, when there will be control data going back to the edge side, Greengrass or IoT Core will be considered for non-stream data.

The customer and I would like to know which of the following (or some new method) would be the best approach.

  • Directly writing to S3 using CLI (or other method) would be worthwhile since S3 writing directly is free. I never observed any projects or architecture diagram writing to S3 directly. So I answered to the customer that this is unlikely, but they demand why which I do not know at this moment.
  • Writing to S3 using Kinesis Data Stream and turning the stream shard off on night time. Currently, this is my best bet, but I would like to know your opinion.
  • Using AWS File Gateway to write to S3. But I think this is not worthwhile since local gateway does not need to access the cached files. It's just one way to S3 from the sensors.

Could you please share your opinion? Thank you!

1 Answer
0
Accepted Answer

I would consider IoT Analytics - https://aws.amazon.com/iot-analytics/ - you pay only for what you send, it provides automatic data retention management of raw data as well as of transformed data, it can use service-managed buckets as well as customer-managed bucket, and provides transformation pipelines to filter or enrich the data.

S3 writes are not free, S3 is actually more expensive than other methods for small file, since you pay for each PUT request. If the files have already been assembled client side (eg compressed, formatted in queryable formats such as Parquet), writing to S3 could be a good choice.

For authenticating to S3 you can use pre-signed url or IAM credentials. For IAM/STS tokens I would suggest to use AWS IoT Credential Provider - https://docs.aws.amazon.com/iot/latest/developerguide/authorizing-direct-aws.html - to exchange the device certificate for the tokens.

For using a presigned URL you would instead have an API or an MQTT service that can generate the URL when the device needs it.

Finally. using AWS Greengrass would provide you lots of the functionality out of the box and more, such as automatic management of IAM credentials via TES, Stream Manager (https://docs.aws.amazon.com/greengrass/latest/developerguide/stream-export-configurations.html) and communication with AWS IoT Core.

My suggestion would then be (in order of preference):

  1. Greengrass + Stream Manager + AWS IoT Analytics
  2. Greengrass + custom lambda to create the file + Stream Manager S3 exporter
  3. Greengrass + custom lambda to create the file + custom lambda to upload to S3
AWS
EXPERT
answered 3 years ago
profile picture
EXPERT
reviewed 23 days ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions