Need assistance with parsing data and creating Athena table

1

Hi I am trying to create an Athena table from S3 bucket that contains access logs from API Gw. The format of the data is as follows:

2022-06-24T17:09:01.037Z { 'requestId’:’341122-4312-4311-9989-b837438748', 'ip': ’127.0.0.1, 'caller':'-', 'user':'-','requestTime':'24/Jun/2022:17:09:01 +0000', 'xrayTraceId':'-', 'wafResponseCode':'-', 'httpMethod':'GET','resourcePath’:’/items/list’, 'status':'200','protocol':'HTTP/1.1', 'responseLength':'238' }

I tried with and without row input serde to identify or process the data but nothing seems to be working. Without any input configuration I do the content in the table but its formatted incorrectly. I used RegEx, and tested my results on online resources and though the regex looks good online I cannot seem achieve the same result once the table is created. Either I will get a empty response or following message:

HIVE_CURSOR_ERROR: Number of matching groups doesn't match the number of columns

Any help is appreciated.

I am expecting the result to be in this format:

timestamprequestIdipcalleruserrequestTimexrayTraceIdwafResponseCodehttpMethodresourcePathstatusprotocolresponseLength
2022-06-24T17:09:01.037Z341122-4312-4311-9989-b837438748127.0.0.1----------
jg_dh
질문됨 2년 전449회 조회
1개 답변
0

After checking the article mentioned by @Tasio I found out that the data that's coming to cloud watch is not formated properly (un-even spaces). Further digging I found the root cause to be cloudformation template, the format string for API Gw access logs had un-even spaces. While sorting out the un-even space issue I stumble upon another option to have API GW send access logs as csv. SO, I made that change and on Athena I set up the table with following properties:

 InputFormat: "org.apache.hadoop.mapred.TextInputFormat"
  OutputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
  SerdeInfo:
  Parameters: { "separatorChar" : "," }
  SerializationLibrary: "org.apache.hadoop.hive.serde2.OpenCSVSerde"

Now everything is working and I can query the logs using Athena

jg_dh
답변함 2년 전
  • Glad you could fix it.

로그인하지 않았습니다. 로그인해야 답변을 게시할 수 있습니다.

좋은 답변은 질문에 명확하게 답하고 건설적인 피드백을 제공하며 질문자의 전문적인 성장을 장려합니다.

질문 답변하기에 대한 가이드라인

관련 콘텐츠