Regarding sorting in s3 objects

0

Hi, I am sending data from IOT core to AWS kinesis firehose to s3 using AWS iot rule. I am able to get all data in s3 bucket. The problem i am facing is that data received in s3 objects are stored in random manner. How to solve this? or is there any other ways?

kushal
asked a year ago262 views
3 Answers
0

To collect your data in a meaningful way, you can use the dynamic partitioning capability in Kinesis Data Firehose. Typically, you'd use a time-series based partition key for IoT ingested sensor data. When your data is delivered to S3, you'll have a prefix for your bucket and it can include year, month, day, even hour.

https://docs.aws.amazon.com/firehose/latest/dev/dynamic-partitioning.html

"ExtendedS3DestinationConfiguration": {  
"BucketARN": "arn:aws:s3:::my-logs-prod",  
"Prefix": "customer_id=!{partitionKeyFromQuery:customer_id}/ 
    device=!{partitionKeyFromQuery:device}/ 
    year=!{partitionKeyFromQuery:year}/  
    month=!{partitionKeyFromQuery:month}/  
    day=!{partitionKeyFromQuery:day}/  
    hour=!{partitionKeyFromQuery:hour}/"  
}
AWS
Jason_W
answered a year ago
  • Hi @jason W

    Thanks for your comments..

    Actually I am doing the same that is used dynamic partitioning for s3 bucket. This part is ok. problem is data received in the objects of bucket are comes in random manner. for example 2nd packet comes first and 9th packet comes second and so. since i have to plot using these data, it is important to me that it comes in proper sequence. I have checked sequence in iot core and it is correct but in S3, it is not proper.

0

The object names created by Firehose are random, but you can query the content of the files using Amazon Athena.

You can follow this tutorial https://docs.aws.amazon.com/athena/latest/ug/getting-started.html to setup Athena and query your data source.

You would likely need to adapt the tutorial with a SerDe matching your file contents. Refer to https://docs.aws.amazon.com/athena/latest/ug/supported-serdes.html for a list of supported Serializers/Deserializers.

As an alternative to Firehose + S3, you can use AWS IoT Analytics. Check this workshop for a guided how-to. https://catalog.workshops.aws/aws-iot-immersionday-workshop/en-US/aws-iot-analytics/lab20-settingupanalytics

AWS
EXPERT
answered a year ago
0

Hi,

Thanks for the reply.

problem is not that object name created is random. actually data in object is random and we want to get that data in actual sequence or sort it in some way.

Scenario is that we have 10 devices that is sending data to iot core. I want to send data in sorted manner to S3. I am using firehose and data stored in object using partition key but in random order. If i use iot analytics, all data stored in one object instead of 10 data in 10 files.

what to do in this scenario?

kushal
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions