how to get ObjectKey from eventname of 'DeleteObjects'

0

'DeleteObjects''s requestparameters is like '{"bucketName":"demo-bucket","Host":"s3.us-east-2.amazonaws.com","delete":""}' , how can i get deleted ObjectKey

  • Can you be more specific about what kind of data are you querying with Athena? How do you accumulate data in the data lake? Does your data lake actually store information about deleted object keys?

chunhui
asked 6 months ago246 views
3 Answers
0

Hello. Create a table in Athena that's mapped to the location of your CloudTrail logs in S3.

The table can be set up with a command like this (although the actual columns and types might differ based on the exact structure and version of your CloudTrail logs):

CREATE EXTERNAL TABLE cloudtrail_logs (
   eventversion STRING,
   useridentity STRUCT<
      type: STRING,
      principalid: STRING,
      arn: STRING,
      accountid: STRING,
      invokedby: STRING,
      accesskeyid: STRING,
      userName: STRING,
      sessioncontext: STRUCT<
         attributes: STRUCT<
            mfaauthenticated: STRING,
            creationdate: STRING
         >,
         sessionIssuer: STRUCT<
            type: STRING,
            principalId: STRING,
            arn: STRING, 
            accountId: STRING,
            userName: STRING
         >
      >
   >,
   eventtime STRING,
   eventsource STRING,
   eventname STRING,
   awsregion STRING,
   sourceipaddress STRING,
   useragent STRING,
   errorcode STRING,
   errormessage STRING,
   requestparameters STRING,
   responseelements STRING,
   additionaleventdata STRING,
   eventid STRING,
   eventtype STRING,
   apiVersion STRING,
   readOnly STRING,
   resources ARRAY<STRUCT<
      ARN: STRING,
      accountId: STRING,
      type: STRING
   >>,
   serviceeventdetails STRING,
   sharedeventid STRING,
   vpcendpointid STRING
)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://YOUR_CLOUDTRAIL_BUCKET/AWSLogs/YOUR_ACCOUNT_ID/CloudTrail/';

Replace YOUR_CLOUDTRAIL_BUCKET and YOUR_ACCOUNT_ID with your specific S3 bucket and AWS account ID where CloudTrail logs are stored.

Query Deleted Object Keys:

SELECT 
   json_extract(requestparameters, '$.delete.objects') AS deleted_objects
FROM 
   cloudtrail_logs
WHERE 
   eventname = 'DeleteObjects' 
   AND eventsource = 's3.amazonaws.com'
   AND json_extract(requestparameters, '$.bucketName') = 'demo-bucket';

This will give you the list of deleted objects for the demo-bucket bucket.

Regards, Andrii

profile picture
EXPERT
answered 6 months ago
  • Yes, I've created the CloudTrail table, but after deleting the object through the console, the data I query in Athena by 'select requestparameters FROM table where eventname = 'DeleteObjects' limit 1' is '{"bucketName":"demo_bucket","Host":"s3.us-east-2.amazonaws.com","delete":""}' , the 'delete' is '' and file has been deleted

  • If you're not seeing the expected content within the delete field for the DeleteObjects event in CloudTrail logs, there could be several reasons:

    Single Object Delete vs. Multiple Object Delete: When you delete an object directly in the S3 console, it may not use the DeleteObjects API action, which is designed to delete multiple objects in a single request. Instead, it might use the DeleteObject (singular) API action. If you've deleted just a single object, try searching for DeleteObject in the CloudTrail logs.

    Delayed Logging: CloudTrail logs can sometimes have a slight delay. Ensure you've waited long enough for the log to appear.

    CloudTrail Configuration: Double-check to make sure the CloudTrail is correctly configured to capture all S3 bucket events and that the logs are stored in the expected location.

    Log Rotation or Overwrite: Ensure that the logs are not getting overwritten or deleted due to any lifecycle policies on the S3 bucket.

    Athena Query Freshness: Ensure that you run the MSCK REPAIR TABLE <your_table_name> command in Athena if you're using partitioned CloudTrail logs. This will help Athena recognize new partitions that have been added since the table was last queried.

    For your current situation:

    Since you've deleted the object from the console, try checking for the DeleteObject event:

    ''' SELECT requestparameters FROM table WHERE eventname = 'DeleteObject' LIMIT 1; ''' This should provide details on the individual object deleted. If you find the

  • Yes, because I deleted it from the console, but I only selected one file, and it did trigger the DeleteObjects event. The configuration was done according to the documentation, and there is definitely no issue with the storage location. If there were a problem, Athena wouldn't be able to query it. The current issue is that after deleting the file, I can't retrieve the specific Object information I deleted through Athena.And i find event by 'DeleteObject' can't find any result. The information is confirmed to be recorded in DeleteObjects, but it's incomplete.

  • Alternative Workarounds:

    Versioning: If your S3 bucket has versioning enabled, you could use S3 Inventory reports to list all objects and their current versions. By comparing reports before and after deletions, you can identify which objects were deleted. S3 Access Logs: While CloudTrail provides logs related to API events, enabling server access logging for the S3 bucket itself will give you detailed records of requests made to the bucket. This is another route to get insights on deleted objects.

  • ok ,thanks, I want to achieve it through Athena. I'm still exploring

0

DeleteObjects requires Bucket and Delete if you use AWS SDK. In Delete structure, you specify the object keys that you want to delete in Objects property. Here's a sample DeleteObjects request in Node.js.

const command = new DeleteObjectsCommand({
  Bucket: "my-bucket",
  Delete: {
    Objects: [{ Key: "test.txt" }, { Key: "test2.txt" }],
  },
});

References:

profile picture
HS
answered 6 months ago
  • I want to retrieve information about deleted ObjectKeys using Athena, not by making API calls.

0

-> To find out how an S3 object was deleted, you can review either server access logs or AWS CloudTrail logs. -> https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-cloudtrail-logging-for-s3.html

Once you get the information in CloudTrail Logs, you can run Athena query to fetch the information- https://docs.aws.amazon.com/athena/latest/ug/cloudtrail-logs.html

AWS
answered 6 months ago
  • Yes, I am querying through Athena, but the Athena information is incomplete.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions