I have log files in my onpremise application, Individual file size is 100 MB (file rollover/appender defined in Log4J in onpremise application) and 50 such files get generated on daily basis which is 5 GB total. We then zip these files at day end and push to S3 bucket manually today.
- We dont want to use cloudwatch agent in onpremise application to push logs to CloudWatch (as onpremise applications are not having CPU/Memory and running at peak)
- We zip the files at day end else we have to do 50-55 S3 Copy manually each day (yes we can create some script for this - but again not elegant)
- Yes Elasticsearch is an option but we are building solution which will take 2-3 months to ingest data to Elasticsearch and use ELK stack
Now till 1 year, whenever we get customer complaint we have to extract specific file. We know the date, but unfortunately we have to download all 5 GB for that day and extract the required file and content.
As part of this usecase wanted to check:
- If Athena works for Log4J log files (from JBoss, Websphere) - do we have Serde's (serialization / deserialization libraries) for the same, glue catalog for the same