Parsing Redshift Audit logs

0

We have Redshift's audit logs enabled. They are setup to write to S3. We are attempting to parse these and ingest them into Splunk. In the logs the multi-line SQL queries are getting lost in the stream as there is no way to correlate the end of an SQL query with the preceding statements. How can we work around this?

tjtoll
posta 7 mesi fa310 visualizzazioni
1 Risposta
0

Hello.

Parsing Redshift's audit logs with multiline SQL statements and ingesting them into Splunk can be challenging due to the multiline nature of SQL statements. However, you can use a combination of pre-processing and Splunk configurations to ensure that multiline SQL queries are ingested properly. Here's a suggested approach:

Log Structure Understanding: First, understand the structure of your audit logs. Typically, a Redshift log entry will have a timestamp, log level, process ID, user, database, and other metadata, followed by the actual SQL statement.

Before ingesting them into Splunk, you can write a script (Python, for example) that goes through the logs and combines multiline SQL statements into a single line.

This script can replace newline characters within SQL statements with a space or some other delimiter of your choice, ensuring that each SQL statement becomes a single line.

Splunk Configuration: Use Splunk's props.conf and transforms.conf to set up custom source type configurations for Redshift logs.

In the props.conf, specify the new source type, let's call it redshift_processed:

[redshift_processed]
SHOULD_LINEMERGE=true
LINE_BREAKER=([\r\n]+)\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z # Assuming the timestamp format YYYY-MM-DDTHH:MM:SS.sssZ, modify as per your timestamp format.
TRUNCATE=0

The SHOULD_LINEMERGE setting is used to indicate that Splunk should consider multiple lines as a single event until it encounters the LINE_BREAKER pattern, which indicates a new event.

TRUNCATE=0 ensures that very long events (like big SQL statements) aren't truncated.

Splunk Data Input:

When setting up the data input in Splunk for the pre-processed logs, make sure to specify the custom source type (redshift_processed) so that Splunk knows how to handle the multiline events.

Regards, Andrii

profile picture
ESPERTO
con risposta 7 mesi fa
profile picture
ESPERTO
verificato un mese fa

Accesso non effettuato. Accedi per postare una risposta.

Una buona risposta soddisfa chiaramente la domanda, fornisce un feedback costruttivo e incoraggia la crescita professionale del richiedente.

Linee guida per rispondere alle domande