The following log insights query on a single log group returns negative numbers for the variable @distinct_unique_keys_delta
:
parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
by datefloor(@timestamp, 1d) as @_datefloor
| sort @_datefloor asc
My understanding is that the number of unique values of a variable can never be more than the total number of values of a variable. When I ran this query I was concerned that I might be misunderstanding the correct usage of datefloor
, so I tried this query:
parse @message /(?<@unique_key>Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+)/
| filter @message like /Processing key: \w+\/[\w=_-]+\/\w+\.\d{4}-\d{2}-\d{2}-\d{2}\.[\w-]+\.\w+\.\w+/
| stats count(@unique_key) - count_distinct(@unique_key) as @distinct_unique_keys_delta
The result of this query for the time range I chose (a whole day), was -20,347 for the @distinct_unique_keys_delta
variable.
To me this result seems completely nonsensical. Am I doing something wrong, interpreting the results wrong or is there a bug in the code running this log insights query?