I have inserted some data into the kinesis stream and I can see it if I give the sequence number but not if I select Latest. Why? This statement from docs has me scratching my head: Latest: show records just after the most recent record in the shard, so that you always read the most recent data in the shard. How is there data after the MOST RECENT data? And why is it not showing the most recent data I inserted?
I am trying to follow this: https://aws.amazon.com/blogs/security/get-custom-data-into-amazon-security-lake-through-ingesting-azure-activity-logs/ Trim Horizon also is not showing data as given in this article.
- Why is the data not coming with Latest? Should i change anything in my putRecord api call for that?
It is currently just:
response = kinesis_client.put_record(StreamARN=SECURITY_LAKE_AZURE_STREAM_ARN,
Data=json.dumps(record),
PartitionKey=“time”
)
- To pick these data in a dataframe, what should I give in connection options? “startingPosition”: “earliest” is not fetching anything.
How it looks in data viewer:
Dataframe code:
dataframe_KinesisStream_node1 = glueContext.create_data_frame.from_options(
connection_type="kinesis",
connection_options={
"typeOfData": "kinesis",
"streamARN": SECURITY_LAKE_AZURE_STREAM_ARN,
"classification": "json",
"startingPosition": "earliest",
"inferSchema": "true",
},
transformation_ctx="dataframe_KinesisStream_node1",
)
What connection_option for startingPosition should be given to fetch this data properly?
normally TRIM_HORIZON so you process the data you might have already there