Need assistance with parsing data and creating Athena table

1

Hi I am trying to create an Athena table from S3 bucket that contains access logs from API Gw. The format of the data is as follows:

2022-06-24T17:09:01.037Z { 'requestId’:’341122-4312-4311-9989-b837438748', 'ip': ’127.0.0.1, 'caller':'-', 'user':'-','requestTime':'24/Jun/2022:17:09:01 +0000', 'xrayTraceId':'-', 'wafResponseCode':'-', 'httpMethod':'GET','resourcePath’:’/items/list’, 'status':'200','protocol':'HTTP/1.1', 'responseLength':'238' }

I tried with and without row input serde to identify or process the data but nothing seems to be working. Without any input configuration I do the content in the table but its formatted incorrectly. I used RegEx, and tested my results on online resources and though the regex looks good online I cannot seem achieve the same result once the table is created. Either I will get a empty response or following message:

HIVE_CURSOR_ERROR: Number of matching groups doesn't match the number of columns

Any help is appreciated.

I am expecting the result to be in this format:

timestamprequestIdipcalleruserrequestTimexrayTraceIdwafResponseCodehttpMethodresourcePathstatusprotocolresponseLength
2022-06-24T17:09:01.037Z341122-4312-4311-9989-b837438748127.0.0.1----------
jg_dh
gefragt vor 2 Jahren449 Aufrufe
1 Antwort
0

After checking the article mentioned by @Tasio I found out that the data that's coming to cloud watch is not formated properly (un-even spaces). Further digging I found the root cause to be cloudformation template, the format string for API Gw access logs had un-even spaces. While sorting out the un-even space issue I stumble upon another option to have API GW send access logs as csv. SO, I made that change and on Athena I set up the table with following properties:

 InputFormat: "org.apache.hadoop.mapred.TextInputFormat"
  OutputFormat: "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
  SerdeInfo:
  Parameters: { "separatorChar" : "," }
  SerializationLibrary: "org.apache.hadoop.hive.serde2.OpenCSVSerde"

Now everything is working and I can query the logs using Athena

jg_dh
beantwortet vor 2 Jahren
  • Glad you could fix it.

Du bist nicht angemeldet. Anmelden um eine Antwort zu veröffentlichen.

Eine gute Antwort beantwortet die Frage klar, gibt konstruktives Feedback und fördert die berufliche Weiterentwicklung des Fragenstellers.

Richtlinien für die Beantwortung von Fragen