Parse application logs via Athena and Glue catalog

0

I have log files in my onpremise application, Individual file size is 100 MB (file rollover/appender defined in Log4J in onpremise application) and 50 such files get generated on daily basis which is 5 GB total. We then zip these files at day end and push to S3 bucket manually today.

  1. We dont want to use cloudwatch agent in onpremise application to push logs to CloudWatch (as onpremise applications are not having CPU/Memory and running at peak)
  2. We zip the files at day end else we have to do 50-55 S3 Copy manually each day (yes we can create some script for this - but again not elegant)
  3. Yes Elasticsearch is an option but we are building solution which will take 2-3 months to ingest data to Elasticsearch and use ELK stack

Now till 1 year, whenever we get customer complaint we have to extract specific file. We know the date, but unfortunately we have to download all 5 GB for that day and extract the required file and content.

As part of this usecase wanted to check:

  1. If Athena works for Log4J log files (from JBoss, Websphere) - do we have Serde's (serialization / deserialization libraries) for the same, glue catalog for the same
已提问 2 年前662 查看次数
1 回答
0

Have you had a look at the GROK SerDe? This may help you? https://docs.aws.amazon.com/athena/latest/ug/grok-serde.html Example 2 uses log4j format. Hope this helps.

已回答 2 年前

您未登录。 登录 发布回答。

一个好的回答可以清楚地解答问题和提供建设性反馈,并能促进提问者的职业发展。

回答问题的准则