Parse application logs via Athena and Glue catalog

0

I have log files in my onpremise application, Individual file size is 100 MB (file rollover/appender defined in Log4J in onpremise application) and 50 such files get generated on daily basis which is 5 GB total. We then zip these files at day end and push to S3 bucket manually today.

  1. We dont want to use cloudwatch agent in onpremise application to push logs to CloudWatch (as onpremise applications are not having CPU/Memory and running at peak)
  2. We zip the files at day end else we have to do 50-55 S3 Copy manually each day (yes we can create some script for this - but again not elegant)
  3. Yes Elasticsearch is an option but we are building solution which will take 2-3 months to ingest data to Elasticsearch and use ELK stack

Now till 1 year, whenever we get customer complaint we have to extract specific file. We know the date, but unfortunately we have to download all 5 GB for that day and extract the required file and content.

As part of this usecase wanted to check:

  1. If Athena works for Log4J log files (from JBoss, Websphere) - do we have Serde's (serialization / deserialization libraries) for the same, glue catalog for the same
asked 2 years ago617 views
1 Answer
0

Have you had a look at the GROK SerDe? This may help you? https://docs.aws.amazon.com/athena/latest/ug/grok-serde.html Example 2 uses log4j format. Hope this helps.

answered 2 years ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions