Need assistance with parsing data and creating Athena table

1

Hi I am trying to create an Athena table from S3 bucket that contains access logs from API Gw. The format of the data is as follows:

2022-06-24T17:09:01.037Z { 'requestId’:’341122-4312-4311-9989-b837438748', 'ip': ’127.0.0.1, 'caller':'-', 'user':'-','requestTime':'24/Jun/2022:17:09:01 +0000', 'xrayTraceId':'-', 'wafResponseCode':'-', 'httpMethod':'GET','resourcePath’:’/items/list’, 'status':'200','protocol':'HTTP/1.1', 'responseLength':'238' }

I tried with and without row input serde to identify or process the data but nothing seems to be working. Without any input configuration I do the content in the table but its formatted incorrectly. I used RegEx, and tested my results on online resources and though the regex looks good online I cannot seem achieve the same result once the table is created. Either I will get a empty response or following message:

HIVE_CURSOR_ERROR: Number of matching groups doesn't match the number of columns

Any help is appreciated.

I am expecting the result to be in this format:

timestamprequestIdipcalleruserrequestTimexrayTraceIdwafResponseCodehttpMethodresourcePathstatusprotocolresponseLength
2022-06-24T17:09:01.037Z341122-4312-4311-9989-b837438748127.0.0.1----------
jg_dh
asked 2 years ago440 views
1 Answer
0

After checking the article mentioned by @Tasio I found out that the data that's coming to cloud watch is not formated properly (un-even spaces). Further digging I found the root cause to be cloudformation template, the format string for API Gw access logs had un-even spaces. While sorting out the un-even space issue I stumble upon another option to have API GW send access logs as csv. SO, I made that change and on Athena I set up the table with following properties:

 InputFormat: "org.apache.hadoop.mapred.TextInputFormat"
  OutputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
  SerdeInfo:
  Parameters: { "separatorChar" : "," }
  SerializationLibrary: "org.apache.hadoop.hive.serde2.OpenCSVSerde"

Now everything is working and I can query the logs using Athena

jg_dh
answered 2 years ago
  • Glad you could fix it.

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions