Why so many shardIds when I've only configured 3 in my Kinesis Stream?

0

I have Kinesis consumer code that does a DescribeStream and then spins up a new Java thread per shardId to consume off each shard.

I get 8 shardIds when I've only configured 3 in my Stream. Why is that? I don't want to have 5 extra threads consuming constantly and getting zero records. Below, you can see I'm logging the total # of records processed on each shard.

2020-11-19 08:59:49 INFO  GetRecords:109 - # Kinesis consumers: 8
2020-11-19 08:59:49 INFO  GetRecords:112 - Kinesis - ShardId: 'shardId-000000000000', Total Records: 0
2020-11-19 08:59:49 INFO  GetRecords:112 - Kinesis - ShardId: 'shardId-000000000001', Total Records: 0
2020-11-19 08:59:49 INFO  GetRecords:112 - Kinesis - ShardId: 'shardId-000000000002', Total Records: 0
2020-11-19 08:59:49 INFO  GetRecords:112 - Kinesis - ShardId: 'shardId-000000000003', Total Records: 19110
2020-11-19 08:59:49 INFO  GetRecords:112 - Kinesis - ShardId: 'shardId-000000000004', Total Records: 0
2020-11-19 08:59:49 INFO  GetRecords:112 - Kinesis - ShardId: 'shardId-000000000005', Total Records: 0
2020-11-19 08:59:49 INFO  GetRecords:112 - Kinesis - ShardId: 'shardId-000000000006', Total Records: 18981
2020-11-19 08:59:49 INFO  GetRecords:112 - Kinesis - ShardId: 'shardId-000000000007', Total Records: 16195

Background: I started with 1, then configured 2, then, 3. Does this have something to do with the other shardIds that have 0 records? If so, what is the recommended code/practice to ignore a certain type of shard?

AWS
Jason_W
질문됨 3년 전607회 조회
1개 답변
0
수락된 답변

When you change the number of shards, presumably with the update-shard-count api, Kinesis is taking care of merging and splitting individual shards to get to your desired number of shards. Records in kinesis are immutable, outside of aging off the stream, and need to be able to be read in order. So you end up with a lineage of shards. A parent shard that contained all records within a given hash key range, and then two child shards that split that range and take new records going forward. When those child shards are created the parent shard doesn't go away, the records stay in place in that parent shard and the child shards start taking new records. This also works in reverse when you merge two shards.

When you do the describeStream call each shard in the shards list will list a ParentShardId if it has one. This allows you to contruct a map of the lineage to start reading in the correct place. This is also some of the complexity that is handled for you if you're using the KCL or a Lambda consumer

AWS
전문가
Adam_W
답변함 3년 전

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠