How do I troubleshoot issues with the "ApproximateAgeOfOldestMessage" metric for my Amazon SQS queue?

3 minute read
0

I want to troubleshoot issues with the Amazon CloudWatch metric "ApproximateAgeOfOldestMessage" for my Amazon Simple Queue Service (Amazon SQS) queue.

Resolution

To troubleshoot the ApproximateAgeOfOldestMessage metric, complete the following tasks.

Identify whether a poison pill message causes a spike

Your queue metrics are usually equal

Compare the NumberOfMessagesSent, NumberOfMessagesReceived, and NumberOfMessagesDeleted metrics. If the metrics don't match, then it might be because your Amazon SQS queue consumed a poison pill message. To identify whether a poison pill message caused a spike, check if the number of NumberOfMessagesReceived and NumberOfMessagesDeleted metrics are greater than the NumberOfMessagesSent metric. Also, check if there's a drop in deleted messages at the time of the spike. To identify whether your queue consumed older messages at the time of the time of the spike, it's a best practice to check consumer logs.

Your queue metrics aren't usually equal

Compare the NumberOfMessagesSent, NumberOfMessagesReceived, and NumberOfMessagesDeleted metrics. If the metrics aren't equal, then check the consumer logs to see if the queue consumed any older messages at the time of the spike. Also, check if any messages are processed multiple times because this might indicate that the queue consumed a poison pill message.

Troubleshoot spikes in the ApproximateAgeOfOldestMessage metric

The queue regularly spikes, or the spikes are more frequent

If the queue has regular spikes in ApproximateAgeOfOldestMessage, then the consumer application might be taking too long to poll messages from the queue. The consumer application might take longer to process and delete any received messages, and messages in the queue remain in flight longer.

If there are also spikes in ApproximateNumberOfMessagesVisible metric at the same time, then there's a backlog in the queue. If NumberOfMessagesReceived is usually lower than NumberOfMessagesSent, then the consumer application must scale to maintain the backlog.

If NumberOfMessagesReceived matches NumberOfMessagesSent but NumberOfMessagesDeleted is lower, then the consumer application takes longer to delete messages and causes your metrics to spike. Check if the consumer application has any errors when it processes or deletes messages from the queue.

The queue has a single short spike

If there's a spike in the metric that lasts for only a minute with no other activity in the queue, then it's likely a metric issue. Because Amazon SQS is distributed, there are times when the metric might be higher than expected. Contact AWS Support to determine whether there are reported issues with the Amazon SQS service.

Related information

Available CloudWatch metrics for Amazon SQS

Features of message queues

AWS OFFICIAL
AWS OFFICIALUpdated 7 months ago