[S3] Kinesis, File Gateway, or direct S3 writing?

0

Hi,

I have a customer who wants to write solar power generator's sensor data to S3. The data stream happens usually during the day time and almost no data on night time. It will likely be about 1MB / second during the day time. It may vary to 5MB or more depending on how many solar panels in the deployed generator area.

There may be network off time to time since solar power generators are usually place on mountain area.

They want to save the sensor data to S3 since it's all read only data there. They will use SageMaker as well for complex Machine Learning process. The ML process + weather information will eventually make prediction about how much power will be generated for the next month after power generation commitment is made.

There is no control data going back to edge side, so I filtered out IoT Core from the data ingestion consideration. There were similar previous project in Korea using IoT Core, but had trouble from streaming data to the cloud and found rather Kinesis was better approach. However, in the later stage, when there will be control data going back to the edge side, Greengrass or IoT Core will be considered for non-stream data.

The customer and I would like to know which of the following (or some new method) would be the best approach.

  • Directly writing to S3 using CLI (or other method) would be worthwhile since S3 writing directly is free. I never observed any projects or architecture diagram writing to S3 directly. So I answered to the customer that this is unlikely, but they demand why which I do not know at this moment.
  • Writing to S3 using Kinesis Data Stream and turning the stream shard off on night time. Currently, this is my best bet, but I would like to know your opinion.
  • Using AWS File Gateway to write to S3. But I think this is not worthwhile since local gateway does not need to access the cached files. It's just one way to S3 from the sensors.

Could you please share your opinion? Thank you!

1개 답변
0
수락된 답변

I would consider IoT Analytics - https://aws.amazon.com/iot-analytics/ - you pay only for what you send, it provides automatic data retention management of raw data as well as of transformed data, it can use service-managed buckets as well as customer-managed bucket, and provides transformation pipelines to filter or enrich the data.

S3 writes are not free, S3 is actually more expensive than other methods for small file, since you pay for each PUT request. If the files have already been assembled client side (eg compressed, formatted in queryable formats such as Parquet), writing to S3 could be a good choice.

For authenticating to S3 you can use pre-signed url or IAM credentials. For IAM/STS tokens I would suggest to use AWS IoT Credential Provider - https://docs.aws.amazon.com/iot/latest/developerguide/authorizing-direct-aws.html - to exchange the device certificate for the tokens.

For using a presigned URL you would instead have an API or an MQTT service that can generate the URL when the device needs it.

Finally. using AWS Greengrass would provide you lots of the functionality out of the box and more, such as automatic management of IAM credentials via TES, Stream Manager (https://docs.aws.amazon.com/greengrass/latest/developerguide/stream-export-configurations.html) and communication with AWS IoT Core.

My suggestion would then be (in order of preference):

  1. Greengrass + Stream Manager + AWS IoT Analytics
  2. Greengrass + custom lambda to create the file + Stream Manager S3 exporter
  3. Greengrass + custom lambda to create the file + custom lambda to upload to S3
AWS
전문가
답변함 3년 전
profile picture
전문가
검토됨 한 달 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠