[S3] Kinesis, File Gateway, or direct S3 writing?

0

Hi,

I have a customer who wants to write solar power generator's sensor data to S3. The data stream happens usually during the day time and almost no data on night time. It will likely be about 1MB / second during the day time. It may vary to 5MB or more depending on how many solar panels in the deployed generator area.

There may be network off time to time since solar power generators are usually place on mountain area.

They want to save the sensor data to S3 since it's all read only data there. They will use SageMaker as well for complex Machine Learning process. The ML process + weather information will eventually make prediction about how much power will be generated for the next month after power generation commitment is made.

There is no control data going back to edge side, so I filtered out IoT Core from the data ingestion consideration. There were similar previous project in Korea using IoT Core, but had trouble from streaming data to the cloud and found rather Kinesis was better approach. However, in the later stage, when there will be control data going back to the edge side, Greengrass or IoT Core will be considered for non-stream data.

The customer and I would like to know which of the following (or some new method) would be the best approach.

  • Directly writing to S3 using CLI (or other method) would be worthwhile since S3 writing directly is free. I never observed any projects or architecture diagram writing to S3 directly. So I answered to the customer that this is unlikely, but they demand why which I do not know at this moment.
  • Writing to S3 using Kinesis Data Stream and turning the stream shard off on night time. Currently, this is my best bet, but I would like to know your opinion.
  • Using AWS File Gateway to write to S3. But I think this is not worthwhile since local gateway does not need to access the cached files. It's just one way to S3 from the sensors.

Could you please share your opinion? Thank you!

profile pictureAWS
已提问 3 年前550 查看次数
1 回答
0
已接受的回答

I would consider IoT Analytics - https://aws.amazon.com/iot-analytics/ - you pay only for what you send, it provides automatic data retention management of raw data as well as of transformed data, it can use service-managed buckets as well as customer-managed bucket, and provides transformation pipelines to filter or enrich the data.

S3 writes are not free, S3 is actually more expensive than other methods for small file, since you pay for each PUT request. If the files have already been assembled client side (eg compressed, formatted in queryable formats such as Parquet), writing to S3 could be a good choice.

For authenticating to S3 you can use pre-signed url or IAM credentials. For IAM/STS tokens I would suggest to use AWS IoT Credential Provider - https://docs.aws.amazon.com/iot/latest/developerguide/authorizing-direct-aws.html - to exchange the device certificate for the tokens.

For using a presigned URL you would instead have an API or an MQTT service that can generate the URL when the device needs it.

Finally. using AWS Greengrass would provide you lots of the functionality out of the box and more, such as automatic management of IAM credentials via TES, Stream Manager (https://docs.aws.amazon.com/greengrass/latest/developerguide/stream-export-configurations.html) and communication with AWS IoT Core.

My suggestion would then be (in order of preference):

  1. Greengrass + Stream Manager + AWS IoT Analytics
  2. Greengrass + custom lambda to create the file + Stream Manager S3 exporter
  3. Greengrass + custom lambda to create the file + custom lambda to upload to S3
AWS
专家
已回答 3 年前
profile picture
专家
已审核 1 个月前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则