How to use event filtering with a self-managed Apache Kafka event source for AVRO-encoded messages in AWS Lambda?

0

I'm using AWS Lambda with an event source mapping connected to a self-managed Apache Kafka cluster. The Kafka topic I'm subscribing to has very high volume, and I want to filter events based on the message value to reduce the load on my Lambda.

However, the messages are AVRO-encoded, and I haven't found a way to perform filtering before the Lambda is invoked. It seems that event filtering only works with JSON-based payloads.

Is there any built-in support for filtering Kafka messages that use AVRO schemas? If not, what are the recommended strategies for handling filtering in this case? I'd prefer to avoid decoding inside the Lambda due to performance concerns.

1 Answer
0

Unfortunately, AWS Lambda's built-in event filtering for self-managed Apache Kafka event sources does not directly support AVRO-encoded messages. The event filtering feature is designed to work with UTF-8 encoded strings, either as plain strings or in JSON format.

For AVRO-encoded messages, Lambda's event filtering mechanism will not be able to interpret the content correctly. In this case, Lambda will only be able to filter based on metadata properties, not the actual message content.

Given your situation, here are some potential strategies you could consider:

  1. Pre-processing: If possible, you could set up a pre-processing step that converts the AVRO messages to JSON before they reach Lambda. This could be done using a service like Amazon MSK Connect or a custom solution running on EC2 or ECS.

  2. Custom Consumer: You could create a custom Kafka consumer application that reads the AVRO messages, decodes them, applies the filtering logic, and then sends only the relevant messages to a new Kafka topic or directly to your Lambda function.

  3. Lambda Pre-processing: If the volume after filtering is expected to be significantly lower, you could create an initial Lambda function that decodes the AVRO messages and applies the filtering logic. This function could then invoke another Lambda function with only the relevant messages.

  4. Use Amazon EventBridge Pipes: While not a direct solution for Lambda, you could consider using Amazon EventBridge Pipes. It supports Apache Kafka as a source and can potentially handle more complex transformations before invoking your Lambda function.

  5. Optimize Lambda Processing: If none of the above options are feasible, you might need to handle the decoding and filtering within your Lambda function. To optimize this, you could:

    • Use a compiled language like Java or .NET for better performance with AVRO decoding
    • Implement caching mechanisms for AVRO schemas
    • Consider using provisioned concurrency to reduce cold starts

Remember that each of these approaches has its own trade-offs in terms of complexity, cost, and latency. The best solution will depend on your specific requirements, expected message volume, and acceptable trade-offs.
Sources
Using event filtering with a self-managed Apache Kafka event source - AWS Lambda
Event filtering in Amazon EventBridge Pipes - Amazon EventBridge
Control which events Lambda sends to your function - AWS Lambda

profile picture
answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions