Why are aggregate results in a Log Insights query nonsensical (count < count_distinct for the same variable)?
The following log insights query on a single log group returns negative numbers for the variable @distinct_unique_keys_delta
:
parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/ | filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/ | stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta by datefloor(@timestamp, 1d) as @_datefloor | sort @_datefloor asc
My understanding is that the number of unique values of a variable can never be more than the total number of values of a variable. When I ran this query I was concerned that I might be misunderstanding the correct usage of datefloor
, so I tried this query:
parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/ | filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/ | stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
The result of this query for the time range I chose (a whole day), was -20,347 for the @distinct_unique_keys_delta
variable.
To me this result seems completely nonsensical. Am I doing something wrong, interpreting the results wrong or is there a bug in the code running this log insights query?
I have discovered that the count_distinct
function in AWS Log Insights queries doesn't really return a distinct count! As per the documentation
Returns the number of unique values for the field. If the field has very high cardinality (contains many unique values), the value returned by count_distinct is just an approximation.
Apparently I can't just assume that a function returns an accurate result.
Relevant questions
How to change the color of Log-Insights Visualization?
asked 4 months agoProper conversion of AWS Log Insights to Metrics for visualization and monitoring
asked 4 months agoAWS Route53 Resolver Query Log Config - Terraform - error [RSLVR-00200]
asked 4 months agoNested query Cloudwatch Log Insights
asked 3 years agoCross-account cross-region in cloudwatch for specific log group
asked 7 months agoQuery Plans in Performance Insights
asked 2 years agoCloudwatch Log Insights doesn't find logs from the first Lambda Invocation
asked 11 days agoWhy are aggregate results in a Log Insights query nonsensical (count < count_distinct for the same variable)?
asked a month agoCloudWatch-Insights: How to compare attributes on a query
asked a month agoAWS CloudWatch Log Insights Generate URL
asked 13 days ago