I'm trying to create an app/stack/solution which, when deployed, sets up the necessary infrastructure to programmatically query CloudTrail logs: In particular, to find resource creation requests in some services by a given execution role.
It seemed (e.g. from this Querying CloudTrail Logs page in the Athena developer guide) like Athena would be a good solution here, but I'm struggling to get the setup automated properly.
Setting up the Trail is pretty straightforward. However, my current attempt at mapping the Athena manual partitioning instructions to CDK generating a Glue table, seems to come up with a table with 0 partitions... And I don't really understand how the partition projection instructions could translate to CDK?
There are definitely CloudTrail events in the source bucket/prefix - does anybody know how to make this work?
I'm not that deep on either Glue or Athena yet. Current draft CDK for the Glue table below:
const cloudTrailTable = new glue.Table(this, "CloudTrailGlueTable", {
columns: [
{ name: "eventversion", type: glue.Schema.STRING },
{
name: "useridentity",
type: glue.Schema.struct([
{ name: "type", type: glue.Schema.STRING },
{ name: "principalid", type: glue.Schema.STRING },
{ name: "arn", type: glue.Schema.STRING },
{ name: "accountid", type: glue.Schema.STRING },
{ name: "invokedby", type: glue.Schema.STRING },
{ name: "accesskeyid", type: glue.Schema.STRING },
{ name: "userName", type: glue.Schema.STRING },
{
name: "sessioncontext",
type: glue.Schema.struct([
{
name: "attributes",
type: glue.Schema.struct([
{ name: "mfaauthenticated", type: glue.Schema.STRING },
{ name: "creationdate", type: glue.Schema.STRING },
]),
},
{
name: "sessionissuer",
type: glue.Schema.struct([
{ name: "type", type: glue.Schema.STRING },
{ name: "principalId", type: glue.Schema.STRING },
{ name: "arn", type: glue.Schema.STRING },
{ name: "accountId", type: glue.Schema.STRING },
{ name: "userName", type: glue.Schema.STRING },
]),
},
]),
},
]),
},
{ name: "eventtime", type: glue.Schema.STRING },
{ name: "eventsource", type: glue.Schema.STRING },
{ name: "eventname", type: glue.Schema.STRING },
{ name: "awsregion", type: glue.Schema.STRING },
{ name: "sourceipaddress", type: glue.Schema.STRING },
{ name: "useragent", type: glue.Schema.STRING },
{ name: "errorcode", type: glue.Schema.STRING },
{ name: "errormessage", type: glue.Schema.STRING },
{ name: "requestparameters", type: glue.Schema.STRING },
{ name: "responseelements", type: glue.Schema.STRING },
{ name: "additionaleventdata", type: glue.Schema.STRING },
{ name: "requestid", type: glue.Schema.STRING },
{ name: "eventid", type: glue.Schema.STRING },
{
name: "resources",
type: glue.Schema.array(
glue.Schema.struct([
{ name: "ARN", type: glue.Schema.STRING },
{ name: "accountId", type: glue.Schema.STRING },
{ name: "type", type: glue.Schema.STRING },
])
),
},
{ name: "eventtype", type: glue.Schema.STRING },
{ name: "apiversion", type: glue.Schema.STRING },
{ name: "readonly", type: glue.Schema.STRING },
{ name: "recipientaccountid", type: glue.Schema.STRING },
{ name: "serviceeventdetails", type: glue.Schema.STRING },
{ name: "sharedeventid", type: glue.Schema.STRING },
{ name: "vpcendpointid", type: glue.Schema.STRING },
],
dataFormat: glue.DataFormat.CLOUDTRAIL_LOGS,
database: myGlueDatabase,
tableName: "cloudtrail_table",
bucket: myCloudTrailBucket,
description: "CloudTrail Glue table",
s3Prefix: `AWSLogs/${cdk.Stack.of(this).account}/CloudTrail/`,
partitionKeys: [
{ name: "region", type: glue.Schema.STRING },
{ name: "year", type: glue.Schema.STRING },
{ name: "month", type: glue.Schema.STRING },
{ name: "day", type: glue.Schema.STRING },
],
});
Yes, following this method worked great thanks! (I actually went with
cfnTable.addOverride("Properties.TableInput.Parameters", {...})
, but functionally the same) Now the data is correctly discovered by the table :D