Amazon Athena can efficiently analyze complex JSON data with nested structures, which is common across many AWS services. This guide demonstrates how to parse and analyze complex JSON records using Amazon Inspector findings as a practical example.
Resolution
We'll implement a three-step approach to analyze Amazon Inspector findings:
- Export Inspector findings to Amazon S3
- Create a base table for raw findings
- Create views for parsed data analysis
Step 1: Export JSON Records to S3
- Export your JSON records from your source (e.g., Amazon Inspector console: "Findings" -> "All findings")
- Choose "Export findings" and select JSON format
- Specify your S3 bucket destination and encryption settings
Step 2: Create Raw JSON Table
Execute the following query in Athena to create a base table for the raw JSON records. Replace <path-to-records> with your S3 path:
CREATE EXTERNAL TABLE raw_json_records (
finding string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex' = '(.*)',
'input.regex.case.insensitive' = 'true'
)
STORED AS TEXTFILE
LOCATION 's3://<path-to-records>/';
Step 3: Create Parsed JSON View
Create a view that parses the JSON structure:
CREATE OR REPLACE VIEW parsed_json_records AS
WITH raw_json_data AS (
SELECT
-- Remove leading comma if present
CASE
WHEN finding LIKE ',{%' THEN SUBSTR(finding, 2)
ELSE finding
END AS finding
FROM raw_json_records
)
SELECT
JSON_EXTRACT_SCALAR(finding, '$.awsAccountId') as aws_account_id,
JSON_EXTRACT_SCALAR(finding, '$.description') as description,
JSON_EXTRACT_SCALAR(finding, '$.findingArn') as finding_arn,
JSON_EXTRACT_SCALAR(finding, '$.firstObservedAt') as first_observed_at,
JSON_EXTRACT_SCALAR(finding, '$.lastObservedAt') as last_observed_at,
JSON_FORMAT(JSON_EXTRACT(finding, '$.remediation')) as remediation,
JSON_FORMAT(JSON_EXTRACT(finding, '$.resources')) as resources,
JSON_EXTRACT_SCALAR(finding, '$.severity') as severity,
JSON_EXTRACT_SCALAR(finding, '$.status') as status,
JSON_EXTRACT_SCALAR(finding, '$.type') as type,
JSON_FORMAT(JSON_EXTRACT(finding, '$.codeVulnerabilityDetails')) as code_vulnerability_details,
CAST(JSON_EXTRACT_SCALAR(finding, '$.epss.score') as DOUBLE) as epss,
JSON_FORMAT(JSON_EXTRACT(finding, '$.exploitabilityDetails')) as exploitability_details,
JSON_EXTRACT_SCALAR(finding, '$.exploitAvailable') as exploit_available,
JSON_EXTRACT_SCALAR(finding, '$.fixAvailable') as fix_available,
CAST(JSON_EXTRACT_SCALAR(finding, '$.inspectorScore') as DOUBLE) as inspector_score,
JSON_FORMAT(JSON_EXTRACT(finding, '$.inspectorScoreDetails')) as inspector_score_details,
JSON_FORMAT(JSON_EXTRACT(finding, '$.networkReachabilityDetails')) as network_reachability_details,
JSON_FORMAT(JSON_EXTRACT(finding, '$.packageVulnerabilityDetails')) as package_vulnerability_details,
JSON_EXTRACT_SCALAR(finding, '$.title') as title,
JSON_EXTRACT_SCALAR(finding, '$.updatedAt') as updated_at
FROM
raw_json_data
where JSON_EXTRACT_SCALAR(finding, '$.awsAccountId') is not null
Optional: Create Domain-Specific Views
The nested JSON fields such as resources have different structures based on the service. You may create additional views for service specific analysis.
For EC2-specific findings:
CREATE OR REPLACE VIEW ec2_findings AS
WITH dataset AS (
SELECT
CAST(
json_parse(resources) AS
ARRAY(
ROW(
id varchar,
type varchar,
details json,
partition varchar,
region varchar,
tags json
)
)
) AS items,
*
FROM parsed_json_records
)
SELECT
-- Add fields from nested json resources field as needed
JSON_EXTRACT_SCALAR(JSON_OBJ.details, '$.awsEc2Instance.imageId') AS ec2_image_id,
JSON_OBJ.id AS resource_id,
JSON_OBJ.type AS resource_type,
JSON_OBJ.region AS resource_region,
-- Other fields from parsed_findings view
aws_account_id,
title,
description,
severity,
status,
updated_at
FROM dataset
CROSS JOIN UNNEST(items) t(JSON_OBJ)
WHERE JSON_OBJ.type = 'AWS_EC2_INSTANCE'
Common Queries
Here are some useful queries for analysis:
-- Get high-severity findings
SELECT * FROM parsed_json_records
WHERE severity = 'HIGH'
ORDER BY updated_at DESC;
-- Count findings by severity
SELECT severity, COUNT(*) as count
FROM parsed_json_records
GROUP BY severity;
-- Get findings with available exploits
SELECT title, severity, updated_at
FROM parsed_json_records
WHERE exploit_available = 'true';
Additional Information
- This approach works with any complex JSON structure, not just Inspector findings
- Filter for specific resource types or nested field values as needed
- Extend views by modifying SELECT statements to extract additional nested fields
- Use parsed views directly in Amazon QuickSight for visualization
- Handles common JSON parsing issues like leading commas and nested arrays
Related Information