SQS FiFo Deduplication

0

Is there any way to know which items were thrown out during the deduplication?

I know it's working because I have some custom metrics between 2 processes and it's always off by 10 -20 items. It would just be nice to know for sure that they were thrown out due to the deduplication.

  • please accept the answer if it was useful

asked 4 months ago214 views
1 Answer
0

Possible Solutions to Track Deduplicated Messages:

  • Custom Logging: Implement custom logging on the sender side to track the MessageDeduplicationId for each message before sending it to SQS. This can help you identify which messages might have been discarded by comparing the logs with the messages successfully received by the consumer.

  • Use Unique Message Attributes: Attach a unique attribute to each message, such as a timestamp or a UUID, which can help in identifying duplicates when comparing the sent and received messages.

  • Custom Deduplication Handling: Handle deduplication manually by maintaining a cache or a database that tracks the MessageDeduplicationId of recently sent messages. This way, you can keep a record of all messages and check which ones might have been discarded.

Example of Custom Logging and Unique Attributes:

  • Sender Side:
import boto3
import uuid
import time

sqs = boto3.client('sqs')
queue_url = 'https://sqs.your-region.amazonaws.com/your-account-id/your-queue-name'

def send_message(message_body):
    deduplication_id = str(uuid.uuid4())
    message_group_id = 'your-group-id'  # Replace with your group ID logic
    timestamp = str(int(time.time()))
    
    response = sqs.send_message(
        QueueUrl=queue_url,
        MessageBody=message_body,
        MessageDeduplicationId=deduplication_id,
        MessageGroupId=message_group_id,
        MessageAttributes={
            'Timestamp': {
                'StringValue': timestamp,
                'DataType': 'String'
            }
        }
    )
    
    print(f'Sent message with DeduplicationId: {deduplication_id}, Timestamp: {timestamp}')
    return response

# Example usage
send_message('Your message body')

  • Receiver Side:
import boto3

sqs = boto3.client('sqs')
queue_url = 'https://sqs.your-region.amazonaws.com/your-account-id/your-queue-name'

def receive_messages():
    response = sqs.receive_message(
        QueueUrl=queue_url,
        MaxNumberOfMessages=10,
        MessageAttributeNames=['All']
    )
    
    for message in response.get('Messages', []):
        deduplication_id = message['MessageDeduplicationId']
        timestamp = message['MessageAttributes'].get('Timestamp', {}).get('StringValue')
        
        print(f'Received message with DeduplicationId: {deduplication_id}, Timestamp: {timestamp}')
        # Process the message
        sqs.delete_message(
            QueueUrl=queue_url,
            ReceiptHandle=message['ReceiptHandle']
        )

# Example usage
receive_messages()

While there is no built-in feature to directly identify which messages were discarded due to deduplication in SQS FIFO queues, implementing custom logging and using unique message attributes can help track and identify duplicates. Additionally, monitoring CloudWatch metrics can provide further insights into message processing discrepancies.

profile picture
EXPERT
answered 4 months ago
profile picture
EXPERT
reviewed 4 months ago
  • That's somewhat similar to what I'm doing today. In theory I could implement deduplication in my code using a cache with a TTL but I didn't want to go to that effort. Just wasn't sure if AWS ever considered some sort of callback to the sender or re-routing of deduped messages to dead letter like queue.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions