Need assistance with parsing data and creating Athena table
Hi I am trying to create an Athena table from S3 bucket that contains access logs from API Gw. The format of the data is as follows:
```
2022-06-24T17:09:01.037Z { 'requestId’:’341122-4312-4311-9989-b837438748', 'ip': ’127.0.0.1, 'caller':'-', 'user':'-','requestTime':'24/Jun/2022:17:09:01 +0000', 'xrayTraceId':'-', 'wafResponseCode':'-', 'httpMethod':'GET','resourcePath’:’/items/list’, 'status':'200','protocol':'HTTP/1.1', 'responseLength':'238' }
```
I tried with and without row input serde to identify or process the data but nothing seems to be working. Without any input configuration I do the content in the table but its formatted incorrectly. I used RegEx, and tested my results on online resources and though the regex looks good online I cannot seem achieve the same result once the table is created. Either I will get a empty response or following message:
HIVE_CURSOR_ERROR: Number of matching groups doesn't match the number of columns
Any help is appreciated.
I am expecting the result to be in this format:
| timestamp | requestId | ip | caller | user | requestTime | xrayTraceId | wafResponseCode | httpMethod | resourcePath | status | protocol | responseLength
| --- | --- | --- | --- | --- | --- | --- |--- | --- | --- | --- | --- | --- |
|2022-06-24T17:09:01.037Z |341122-4312-4311-9989-b837438748 |127.0.0.1 |- |- |- |- |- |- |- |- |- |- |