Find referrers using TLSv1

0

I am trying to point my Athena table to read our tflmedia-new Cloudwatch access logs, so that I can find who is still accessing our bucket objects using TLSv1/1.1. I understand those TLS versions will not be allowed by AWS very soon, so we need to determine which clients are still using the older TLS versions.

However, those Cloudwatch access logs are .gz files, so Athena evidently can't read them?

Here is the query I used to create the Athena table:

CREATE EXTERNAL TABLE s3_access_logs_db.cloudtrail_logs( bucketowner STRING, bucket_name STRING, requestdatetime STRING, remoteip STRING, requester STRING, requestid STRING, operation STRING, key STRING, request_uri STRING, httpstatus STRING, errorcode STRING, bytessent BIGINT, objectsize BIGINT, totaltime STRING, turnaroundtime STRING, referrer STRING, useragent STRING, versionid STRING, hostid STRING, sigv STRING, ciphersuite STRING, authtype STRING, endpoint STRING, tlsversion STRING, accesspointarn STRING, aclrequired STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'input.regex'='([^ ]) ([^ ]) \[(.?)\] ([^ ]) ([^ ]) ([^ ]) ([^ ]) ([^ ]) ("[^"]"|-) (-|[0-9]) ([^ ]) ([^ ]) ([^ ]) ([^ ]) ([^ ]) ([^ ]) ("[^"]"|-) ([^ ])(?: ([^ ]) ([^ ]) ([^ ]) ([^ ]) ([^ ]) ([^ ]) ([^ ]) ([^ ]))?.*$') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://tflmedia-new/cloudtrail/AWSLogs/837411345956/CloudTrail/us-east-1/2023/02/13/'

Any help here would be greatly appreciated!

asked a year ago213 views
3 Answers
1

Hello,

I see that the DDL you are using is for creating the table over S3 access logs and the S3 location you are using in your query is where the cloudtrail logs are being stored. I suspect this is the reason it is not able to read the file. Could you please cross check the same. You can refer https://docs.aws.amazon.com/athena/latest/ug/cloudtrail-logs.html for creating the table in athena over cloudtrail logs and https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-s3-access-logs-to-identify-requests.html for creating table in athena over S3 access logs.

Below mentioned is the DDL for creating table over cloudtrail logs in athena. Please note that it is just for your reference and you can modify it according to your use case.

CREATE EXTERNAL TABLE [TABLE_NAME] ( eventVersion STRING, userIdentity STRUCT< type: STRING, principalId: STRING, arn: STRING, accountId: STRING, invokedBy: STRING, accessKeyId: STRING, userName: STRING, sessionContext: STRUCT< attributes: STRUCT< mfaAuthenticated: STRING, creationDate: STRING>, sessionIssuer: STRUCT< type: STRING, principalId: STRING, arn: STRING, accountId: STRING, username: STRING>, ec2RoleDelivery: STRING, webIdFederationData: MAP<STRING,STRING>>>, eventTime STRING, eventSource STRING, eventName STRING, awsRegion STRING, sourceIpAddress STRING, userAgent STRING, errorCode STRING, errorMessage STRING, requestParameters STRING, responseElements STRING, additionalEventData STRING, requestId STRING, eventId STRING, resources ARRAY<STRUCT< arn: STRING, accountId: STRING, type: STRING>>, eventType STRING, apiVersion STRING, readOnly STRING, recipientAccountId STRING, serviceEventDetails STRING, sharedEventID STRING, vpcEndpointId STRING, tlsDetails STRUCT< tlsVersion: STRING, cipherSuite: STRING, clientProvidedHostHeader: STRING> ) COMMENT 'CloudTrail table for [S3_BUCKET_NAME] bucket' ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '[S3_BUCKET_URL]' TBLPROPERTIES ('classification'='cloudtrail');

AWS
answered a year ago
0

+1 to the above answer. Another option for you to try is CloudTrail Lake service. You could then use the sample TLS query to pull a report who is making calls using outdated TLS versions.

Please refer to the CloudTrail Lake pricing and may be cleanup or delete after you pull this report if you dont want to incur ongoing cost.

Enter image description here

AWS
EXPERT
Gokul
answered a year ago
0

Thanks to you both! This URL in particular was very valuable:

https://docs.aws.amazon.com/athena/latest/ug/cloudtrail-logs.html

This especially was helpful:

The example uses a LOCATION value of logs for a particular account, but you can use the degree of specificity that suits your application.

For example:

To analyze data from multiple accounts, you can roll back the LOCATION specifier to indicate all AWSLogs by using LOCATION 's3://MyLogFiles/AWSLogs/'.

To analyze data from a specific date, account, and Region, use LOCATION 's3://MyLogFiles/123456789012/CloudTrail/us-east-1/2016/03/14/'.

Knowing that I could point the table to a specific month of Cloudtrail logs helped bypass a lot of unnecessary data and processing time when I finally ran my query in Athena.

answered a year ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions