- Newest
- Most votes
- Most comments
The above options should work, but you could also try running a Glue Crawler because it automatically discovers and catalogs the schema of semi-structured data like your nested JSON files, eliminating the need to manually define table structures for complex, hierarchical data. When the crawler runs, it scans your S3 bucket, analyzes the JSON structure of your files, and creates a unified table schema in the Glue Data Catalog that understands the nested relationships within your data - essentially creating a "map" that tells Athena how to interpret fields like ConversationCharacteristics.ContactSummary.PostContactSummary.Content and CustomerMetadata.ContactId. The crawler also automatically detects your folder partitioning structure (year/month/day) and creates partition metadata, which dramatically improves query performance and reduces costs when filtering by date. Once cataloged, your previously "invisible" JSON data becomes immediately queryable through standard SQL in Athena, allowing you to perform complex analytics like searching for keywords in contact summaries and joining with your existing Amazon Connect data lake tables using the ContactId as the common key. The high-level process is simply: create crawler → point it at your S3 bucket → run crawler to catalog schema → query data in Athena using SQL. Crawlers can be run on a schedule or used in different workflows.
Querying Amazon Connect Call Summaries from S3 JSON Files Using Athena
Overview
The below steps explains how to query Contact Summary data from JSON files stored in S3 using Amazon Athena, allowing integration with existing Amazon Connect data lake tables.
Table Creation
=> Create an external table in Athena to read the JSON files:
CREATE EXTERNAL TABLE call_summaries ( ConversationCharacteristics STRUCT< ContactSummary: STRUCT< PostContactSummary: STRUCT< Content: string > > >, CustomerMetadata STRUCT< ContactId: string, InputS3Uri: string, InstanceId: string > ) PARTITIONED BY ( year string, month string, day string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://your-bucket/Analysis/Voice/' TBLPROPERTIES ('has_encrypted_data'='false');
Load Partitions
=> After table creation, load the partitions using the folllowing command:
MSCK REPAIR TABLE call_summaries;
Example Queries:
=> Basic Query for Content Search
Find contacts where "Wibble" is mentioned in the summary:
SELECT CustomerMetadata.ContactId, CustomerMetadata.InputS3Uri, ConversationCharacteristics.ContactSummary.PostContactSummary.Content FROM call_summaries WHERE LOWER(ConversationCharacteristics.ContactSummary.PostContactSummary.Content) LIKE '%wibble%';
=>Join with Contact Records
Combine summary data with contact_record table:
SELECT cs.CustomerMetadata.ContactId, cs.CustomerMetadata.InputS3Uri, cs.ConversationCharacteristics.ContactSummary.PostContactSummary.Content, cr.* FROM call_summaries cs JOIN contact_record cr ON cs.CustomerMetadata.ContactId = cr.contactid WHERE LOWER(cs.ConversationCharacteristics.ContactSummary.PostContactSummary.Content) LIKE '%wibble%';
I hope the above helps. Thank You
You can use Amazon Athena to query the JSON files containing Amazon Connect Contact Lens analytics data stored in your S3 bucket. Here's how to approach this:
-
Create an external table in Athena that maps to your JSON files in S3. The table should define the nested JSON structure including the fields you're interested in: ContactId, InputS3Uri, and the PostContactSummary Content.
-
When creating the table, you'll need to:
- Specify the JSON SerDe (serializer/deserializer)
- Define the schema to match your JSON structure with nested fields
- Point to the S3 location where your analysis files are stored
- Set up partitioning based on your folder structure (year/month/day)
- Once the table is created, you can write SQL queries to extract and analyze the data. For example, to find contacts where "Wibble" was mentioned in the PostContactSummary:
SELECT CustomerMetadata.ContactId, CustomerMetadata.InputS3Uri, ConversationCharacteristics.ContactSummary.PostContactSummary.Content FROM your_table_name WHERE LOWER(ConversationCharacteristics.ContactSummary.PostContactSummary.Content) LIKE '%wibble%'
- You can join this data with other Amazon Connect data lake tables using the ContactId field:
SELECT a.CustomerMetadata.ContactId, a.ConversationCharacteristics.ContactSummary.PostContactSummary.Content, b.other_fields FROM your_analysis_table a JOIN contact_record b ON a.CustomerMetadata.ContactId = b.contactid WHERE LOWER(a.ConversationCharacteristics.ContactSummary.PostContactSummary.Content) LIKE '%wibble%'
This approach allows you to query the nested JSON structure directly and extract the specific fields you need, then join them with other Connect data for comprehensive analysis.
Sources
Pull historical AWS Connect (CTR) Contract Trace Records | AWS re:Post
Query JSON data - Amazon Athena
Output file locations for files analyzed by Contact Lens conversational analytics - Amazon Connect
Relevant content
- asked 3 years ago
- AWS OFFICIALUpdated 3 years ago

Creating the table using the SQL provided did not work for me - sure the table was created, however it was empty (0 records) even after running the repair command and even though there were valid .json files in the call recordings S3 bucket. I have had similar issues using SQL of my own of
CREATE EXTERNAL TABLE
sata_analysis(post_contact_summary_contentstring COMMENT 'from deserializer',contact_idstring COMMENT 'from deserializer',input_s3_uristring COMMENT 'from deserializer',instance_idstring COMMENT 'from deserializer',matched_categoriesarray<string> COMMENT 'from deserializer') PARTITIONED BY (yearstring COMMENT '',monthstring COMMENT '',daystring COMMENT '') ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://S3BUCKETNAME/Analysis/Voice/' TBLPROPERTIES ( 'classification'='json')and setting up a crawler as well. So any suggestions would be welcome.