By using AWS re:Post, you agree to the Terms of Use
/Why are aggregate results in a Log Insights query nonsensical (count < count_distinct for the same variable)?/

Why are aggregate results in a Log Insights query nonsensical (count < count_distinct for the same variable)?

0

The following log insights query on a single log group returns negative numbers for the variable @distinct_unique_keys_delta:

parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
        by datefloor(@timestamp, 1d) as @_datefloor 
| sort @_datefloor asc

My understanding is that the number of unique values of a variable can never be more than the total number of values of a variable. When I ran this query I was concerned that I might be misunderstanding the correct usage of datefloor, so I tried this query:

parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta

The result of this query for the time range I chose (a whole day), was -20,347 for the @distinct_unique_keys_delta variable.

To me this result seems completely nonsensical. Am I doing something wrong, interpreting the results wrong or is there a bug in the code running this log insights query?

1 Answers
0

I have discovered that the count_distinct function in AWS Log Insights queries doesn't really return a distinct count! As per the documentation

Returns the number of unique values for the field. If the field has very high cardinality (contains many unique values), the value returned by count_distinct is just an approximation.

Apparently I can't just assume that a function returns an accurate result.

The documentation page.

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions