Skip to content

Best practices for distributing Kinesis shards across multiple pollers - handling resharding and balanced CPU load

0

I'm working on a Node.js application that polls from AWS Kinesis and pushes to Event Hubs. We have multiple poller instances where each poller needs to handle certain shards. Currently, we distribute shards using a simple hash modulo approach:

// Assign shards to pollers'const hashFn = require('string-hash')
const hash = require('string-hash')
return allShards.filter(
    (shard) => hash(shard.shardId) % pollerCount === instanceId
);

Issues we're facing:

  1. Uneven distribution - some pollers get 8 shards while others get none
  2. Since we're using Node.js (single-threaded), pollers with too many shards are hitting CPU limits
  3. We need to handle resharding scenarios efficiently

Requirements:

  1. Even distribution of shards across pollers for balanced CPU utilization
  2. Handle resharding without major redistribution of existing shards
  3. Simple and maintainable solution
  4. Works well with Node.js limitations (single-threaded)

We've explored:

  1. Simple modulo hashing (current approach) - leads to uneven distribution
  2. Consistent hashing with buckets - still shows uneven distribution
  3. Range-based distribution - handles even distribution but moves too many shards during resharding

What are the industry best practices for handling this shard distribution problem? Is there a solution that can provide both even distribution and minimal shard movement during resharding?

Any insights or experiences would be greatly appreciated.

2 Answers
1

To address your shard distribution challenges with Amazon Kinesis Data Streams, consider implementing the following best practices:

  1. Use the Kinesis Client Library (KCL): The KCL is designed to handle shard distribution and load balancing automatically. It manages the distribution of shards across multiple worker instances, ensuring a more even distribution and handling resharding scenarios efficiently.

  2. Implement a dynamic shard assignment strategy: Instead of using a static hash-based approach, implement a dynamic assignment strategy that periodically reassesses the shard distribution. This can help maintain a more balanced load across your pollers.

  3. Leverage the new Kinesis Streams Source connector: If you're using Apache Flink, consider using the new Kinesis Streams Source connector. It comes with a default UniformShardAssigner that maintains a uniform distribution of stream partitionIds across parallel subtasks, even during resharding operations.

  4. Monitor and adjust: Use Amazon CloudWatch metrics to monitor the performance of your stream and individual shards. Identify hot or cold shards and adjust your distribution strategy accordingly.

  5. Implement a buffer: To handle potential spikes in data flow and reduce the risk of hitting CPU limits, implement a buffer system that can temporarily store records before processing.

  6. Consider scaling options: Instead of focusing solely on shard distribution, also consider scaling your application by increasing instance size or the number of instances up to the maximum number of open shards.

  7. Implement graceful resharding handling: When resharding occurs, ensure your application continues to process data from parent shards until they're exhausted before moving to child shards. This maintains data order and allows for smoother transitions during resharding events.

  8. Use Auto Scaling: Implement Auto Scaling for your poller instances. This can help automatically adjust the number of instances based on the load, which is particularly useful when the number of shards increases due to resharding.

By implementing these practices, you can achieve a more even distribution of shards, handle resharding scenarios more efficiently, and maintain a balanced CPU load across your Node.js pollers. Remember to test different combinations of these approaches to find the optimal solution for your specific use case.

Sources
Introducing the new Amazon Kinesis source connector for Apache Flink | AWS Big Data Blog
Use resharding, scaling, and parallel processing to change the number of shards - Amazon Kinesis Data Streams
Decide on a strategy for resharding - Amazon Kinesis Data Streams
Complete the resharding action - Amazon Kinesis Data Streams

answered a year ago
0

Hi,

A best practice is to try to distribute the input records evenly across all shards for various reasons (cost, performances, etc.).

It is well explained in this article: https://medium.com/onebyte-llc/uniform-data-distribution-among-kinesis-data-stream-shards-7d350bca4a99

Best,

Didier

EXPERT
answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.