- Newest
- Most votes
- Most comments
There are a few potential reasons why you might be experiencing missing information when querying AWS CloudTrail logs using Athena. Let's explore some possible causes and solutions:
-
Table Creation Method: You mentioned that you created a non-partitioned table using the CloudTrail console. This approach may not be optimal for querying large volumes of CloudTrail data. Instead, consider creating a partitioned table manually or using partition projection. Partitioning can significantly improve query performance and ensure all data is accessible.
-
S3 Location Path: Verify that the S3 location path specified in your Athena table is correct. An incorrect path can lead to Athena not returning all records or missing information.
-
Partitions: If you're using a partitioned table, ensure that all necessary partitions are loaded. You can use the SHOW PARTITIONS statement to view loaded partitions and the ALTER TABLE ADD PARTITION command to add missing partitions.
-
Data Types and Nested Structures: CloudTrail logs often contain nested structures, especially in fields like userIdentity and resources. Make sure your table definition correctly represents these nested structures. For example, userIdentity is typically a STRUCT type, and its fields can be queried using dot notation (e.g., useridentity.accountid).
-
Query Syntax: Double-check your query syntax, especially when dealing with nested fields. For instance, to query the accessKeyId within userIdentity, you might need to use a syntax like:
SELECT useridentity.accesskeyid, * FROM cloudtrail_table_name WHERE useridentity.accesskeyid IS NOT NULL;
-
Storage Classes: Ensure that your CloudTrail logs are not stored in unsupported S3 storage classes. Athena doesn't support querying data in S3 Glacier Flexible Retrieval or S3 Glacier Deep Archive storage classes.
-
File Formats: Verify that Athena can read your CloudTrail log files. While you mentioned they are in json.gz format, which should be supported, ensure there are no issues with file compression or formatting.
To troubleshoot further:
- Start with a basic query to verify the table structure and data accessibility.
- Gradually add more complex elements to your query to identify where the issue occurs.
- Check Athena query results for any error messages or warnings that might provide additional insight.
If the issue persists after checking these points, you may want to recreate your table with a partitioned structure, ensuring all nested fields are properly defined. This approach often resolves issues with missing data in CloudTrail queries.
Sources
Troubleshoot Athena SELECT query issues for CloudTrail logs | AWS re:Post
Example CloudTrail log queries - Amazon Athena
Troubleshoot issues in Athena - Amazon Athena
If I understand it correctly, you're querying CloudTrail logs using Athena and you are getting values for all the columns except userIdentity.accessKeyId.
Let's start from scratch assuming you've logs in the S3 bucket, you maybe on step-3 but it is important that you follow the best practies
- CREATE Table in Athena using the prescribed DDL - https://docs.aws.amazon.com/athena/latest/ug/create-cloudtrail-table.html. If possible, utilize partition projection feature - https://docs.aws.amazon.com/athena/latest/ug/create-cloudtrail-table-partition-projection.html
- Based on the DDL statement,
userIdentityis defined as<struct>below
CREATE EXTERNAL TABLE cloudtrail_logs_pp(
eventversion STRING,
useridentity STRUCT<
type: STRING,
principalid: STRING,
arn: STRING,
accountid: STRING,
invokedby: STRING,
accesskeyid: STRING,
username: STRING,
onbehalfof: STRUCT<
userid: STRING,
identitystorearn: STRING>,
sessioncontext: STRUCT<
attributes: STRUCT<
mfaauthenticated: STRING,
creationdate: STRING>,
sessionissuer: STRUCT<
type: STRING,
principalid: STRING,
arn: STRING,
accountid: STRING,
username: STRING>,
ec2roledelivery:string,
webidfederationdata: STRUCT<
federatedprovider: STRING,
attributes: map<string,string>>
>
>,
- You can query
userIdentity.accesskeyidas below, refer - https://docs.aws.amazon.com/athena/latest/ug/query-examples-cloudtrail-logs.html#cloudtrail-logs-nested-fields
SELECT
eventsource,
eventname,
useridentity.sessioncontext.attributes.creationdate,
useridentity.sessioncontext.sessionissuer.arn,
useridentity.accesskeyid,
FROM cloudtrail_logs
WHERE useridentity.sessioncontext.sessionissuer.arn IS NOT NULL
ORDER BY eventsource, eventname
LIMIT 10
useridentity.accesskeyidbeing empty - as per the AWS documentation - CloudTrail userIdentity element,
"accessKeyId": The access key ID that was used to sign the request. If the request was made with temporary security credentials, this is the access key ID of the temporary credentials. For security reasons, accessKeyId might not be present, or might be displayed as an empty string."
Note: For security reasons, accessKeyId might not be present, or might be displayed as an empty string.
- To validate, you can check the raw data in the S3 bucket if this has the value and run the query in Athena where the log record has the value for
useridentity.accesskeyid
Hope this explains the issue.
Relevant content
- asked 7 months ago
- asked a year ago
- asked 2 years ago
