- Newest
- Most votes
- Most comments
The defaultFormat
property is a boolean-valued setting. When enabled, it specifies that the VPC flow logs should use the legacy version 2 format of logs, which might still be expected by various third-party products that ingest VPC flow logs. The default format is explained in documentation here: https://docs.aws.amazon.com/vpc/latest/userguide/flow-log-records.html#flow-logs-default
You shouldn't use the default format for anything new you're building where you don't have a hard requirement to be backwards-compatible. The version 2 format lacks highly valuable fields added in version 3 and later. For example, the pkt-srcaddr
and pkt-dstaddr
fields (v3) record the IP addresses contained in the IP frames on the network, while the old v2 fields srcaddr
and dstaddr
in most cases show the primary IPs of the network interfaces that emitted or received the packets, regardless of which of the ENI's IPs were actually present in the packets.
In general, it's advisable to set defaultFormat
to false and specify all the fields that might be relevant for security, troubleshooting, and other purposes in your specific environment in the customFields
property. All the available fields that you can specify are listed and explained in this documentation section: https://docs.aws.amazon.com/vpc/latest/userguide/flow-log-records.html#flow-logs-fields. In general, in addition to the v2 default fields, I'd recommend including at least tcp-flags
, pkt-srcaddr
, pkt-dstaddr
, flow-direction
, traffic-path
, pkt-src-aws-service
, pkt-dst-aws-service
, and reject-reason
, because they are often very useful when analysing logs.
The bot is right about compressed JSON not being an available option for VPC flow log delivery to S3. The options are Gzip-compressed plain text or Gzip-compressed Parquet. I'd generally recommend Parquet for performance and cost-efficiency, if you're using either standard AWS services or custom code you control to analyse the logs. On the AWS platform, Amazon Athena is very convenient for querying VPC flow logs stored either in Parquet or plain text format, but it's more performant and cheaper to have them in Parquet format.
I apologize, but there seems to be a misunderstanding in your question. VPC Flow Logs cannot be saved directly in JSON format. The available formats for VPC Flow Logs are plain text (which is the default) and Apache Parquet.
The "defaultFormat" field you mentioned in the network-config.yaml file is likely used to specify whether to use the default format for VPC Flow Logs or not. When set to false, it might allow you to specify a custom format, but this would still be within the constraints of the available options (text or Parquet).
For VPC Flow Logs, you have two main format options:
- Text: This is the default plain text format.
- Parquet: Apache Parquet is a columnar data format that offers better query performance and reduced storage space compared to plain text.
If you're looking to optimize your log storage and query performance, you might consider using the Parquet format. Queries on data in Parquet format are typically 10 to 100 times faster compared to queries on data in plain text. Additionally, data in Parquet format with Gzip compression takes about 20 percent less storage space than plain text with Gzip compression.
While JSON is not a direct option for VPC Flow Logs, you could potentially transform the logs into JSON format after they've been collected, if that format is necessary for your specific use case. However, this would require additional processing steps outside of the VPC Flow Logs service itself.
Sources
Flow log files - Amazon Virtual Private Cloud
Create a flow log that publishes to Amazon S3 - Amazon Virtual Private Cloud
Relevant content
- asked a month ago
- asked a month ago
- asked 6 months ago
- asked a month ago
- AWS OFFICIALUpdated 2 years ago
- AWS OFFICIALUpdated 3 months ago