Amazon MSK questions

0

A customer has the following questions related to MSK service.

  1. What is the good way to manage topics and their settings? (command line tool / UI)
  2. Can we adjust settings for existing topic? Can we change replication factor or number of partitions?
  3. How to determine the numbers of partitions for the topic? Does it depend on how many brokers do we have?
  4. What will happen if we increase the number of brokers? Is it even possible without recreating cluster?
  5. We fill-up the Timestamp field in kafka message with timestamps from our devices. Is it ok that these timestamps will not be ordered in time ( we can get timestamps from the past)?
  6. Is using protobuf for encode/decode messages a good choice?
  7. What do you think about segmentio/kafka-go golang client? Can you advise us on choosing the "best" client?
  8. Is it possible to manage commits ourself while using consumer groups?
  9. Is there a way to add more disks to a broker instance?
AWS
asked 3 years ago1732 views
1 Answer
0
Accepted Answer

Below are my answers:

1. What is the good way to manage topics and their settings? (command line tool / UI) You can use CMAK for managing Topics and settings (AFAIK T3 brokers are not supported). You can also use Cruise Control for dynamic rebalancing

2. Can we adjust settings for existing topic? Can we change replication factor or number of partitions? Yes, you can use Kafka APIs to do so things like retention period and adding partitions. But changing the Replication Factor after topic creation is not a straightforward process. It is usually done on a Partition level. Let me know if you need more details. CMAK could also be used to add partitions to Topics.

3. How to determine the numbers of partitions for the topic? Does it depend on how many brokers do we have? This is a massive topic. Normally you scale Kafka's throughput through partitions. From the producer side, each partition can ingest 10s oF MB/sec of data, therefore the number of Partitions will depend on the ingestion rate on the Topic and the limit of each partition. For example, if we assume that each partition can support 2 MB/s and the topic ingestion rate is 50 MB/s then you need 25 partitions for this example topic based on the assumptions mentioned. On the consumer side, each partition is assigned to one and only one consumer in a consumer group. But each consumer can have more than one partition. Therefore, on the consumer side the number of partitions is bound by the number of consumers in your largest consumer group.

Moreover, each broker-type has limits on the number of partitions we can have per broker.

Finally, it is important to note that over partitioning has a penalty like it may increase unavailability or latency. This is well documented here.

4. What will happen if we increase the number of brokers? Is it even possible without recreating cluster? Yes, you can horizontally scale the cluster with no impact on cluster availability but you cannot scale back in at the moment. Also you can only scale in multiples of the number of AZs you are using. MSK currently support 2 or 3 AZs so you will need to scale in multiples of 2 or 3.

One thing to note after scaling the cluster you will need to reassign the partitions otherwise the new brokers will be idle.

5. We fill-up the Timestamp field in kafka message with timestamps from our devices. Is it ok that these timestamps will not be ordered in time ( we can get timestamps from the past)? Kafka guarantees the order on a partiton level. So if you have an out-of-order message Kafka will accept the write but will be stored in the order it received the message in. Therefore, you need to have some logic in your consumer to handle out-of-order msgs.

6. Is using protobuf for encode/decode messages a good choice?

Can you please elaborate? Both potobuf and Avro are very popular with Kafka. However, Glue Schema Registry supports Avro at the moment.

7. What do you think about segmentio/kafka-go golang client? Can you advise us on choosing the "best" client?

I will have to look into this

8. Is it possible to manage commits ourself while using consumer groups?

Offsets in Kafka are managed by the consumers. There are many strategies on how to commit offsets, Please reach-out to me if you need more details. Consumers can commit offsets either manually or automatically. Commited offsets are stored in a special Kafka topic

9. Is there a way to add more disks to a broker instance?

You can increase storage but you cannot decrease or add more disks. There is a PFR for this please add your CI. But why would you want to add more disks.

answered 3 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions