Skip to content

Querying Amazon Connect post contact summaries from analytics JSON files

0

I'm interested in being able to query data found in .JSON files in a S3 bucket, perhaps using Athena. The data that I'm after is the summary of the Amazon Connect call and can be found within the individual .json files for each contact recording that are held in a S3 bucket e.g.

"ConversationCharacteristics": { "ContactSummary": { "PostContactSummary": { "Content": "Summary of the contact." } }, }

However these are not available to be queried e.g. in Athena as they are (they are not exposed in the Amazon Connect data lake like other contact and agent data is). The data structure is as follows – • S3 bucket containing call recordings and SATA transcriptions o Analysis  Voice • Year (4 digits) o Month (2 digits)  Day (2 digits) • Individual .json files named contactID_analysis_year-month-dayThh:mm:ssZ.json There will be several 1000 of these per day

Explicitly the data that I'm after is the value of Content within the PostContactSummary (see above) and the contactID and InputS3Uri from the CustomerMetadata section within each JSON file -

"CustomerMetadata": {
	"ContactId": "**CONTACTID**",
	"InputS3Uri": "**S3_URI_FOR_RECORDING_LOCATION**",
	"InstanceId": "**AMAZON_CONNECT_INSTANCE_ID**"
},

Would appreciate any steer that can be provided on how to get this data so that it can be joined (e.g. to other Amazon Connect data lake tables like contact_record that has a key field of contactid) and queried to be able to answer ad hoc queries such as e.g. how many times was the word Wibble mentioned in PostContactSummary Content and what were the contact IDs of those calls :-) Thanks in anticipation !

3 Answers
1

The above options should work, but you could also try running a Glue Crawler because it automatically discovers and catalogs the schema of semi-structured data like your nested JSON files, eliminating the need to manually define table structures for complex, hierarchical data. When the crawler runs, it scans your S3 bucket, analyzes the JSON structure of your files, and creates a unified table schema in the Glue Data Catalog that understands the nested relationships within your data - essentially creating a "map" that tells Athena how to interpret fields like ConversationCharacteristics.ContactSummary.PostContactSummary.Content and CustomerMetadata.ContactId. The crawler also automatically detects your folder partitioning structure (year/month/day) and creates partition metadata, which dramatically improves query performance and reduces costs when filtering by date. Once cataloged, your previously "invisible" JSON data becomes immediately queryable through standard SQL in Athena, allowing you to perform complex analytics like searching for keywords in contact summaries and joining with your existing Amazon Connect data lake tables using the ContactId as the common key. The high-level process is simply: create crawler → point it at your S3 bucket → run crawler to catalog schema → query data in Athena using SQL. Crawlers can be run on a schedule or used in different workflows.

AWS
answered a month ago
AWS
EXPERT
reviewed a month ago
0

Querying Amazon Connect Call Summaries from S3 JSON Files Using Athena

Overview

The below steps explains how to query Contact Summary data from JSON files stored in S3 using Amazon Athena, allowing integration with existing Amazon Connect data lake tables.

Table Creation

=> Create an external table in Athena to read the JSON files:

CREATE EXTERNAL TABLE call_summaries ( ConversationCharacteristics STRUCT< ContactSummary: STRUCT< PostContactSummary: STRUCT< Content: string > > >, CustomerMetadata STRUCT< ContactId: string, InputS3Uri: string, InstanceId: string > ) PARTITIONED BY ( year string, month string, day string ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://your-bucket/Analysis/Voice/' TBLPROPERTIES ('has_encrypted_data'='false');

Load Partitions

=> After table creation, load the partitions using the folllowing command:

MSCK REPAIR TABLE call_summaries;

Example Queries:

=> Basic Query for Content Search

Find contacts where "Wibble" is mentioned in the summary:

SELECT CustomerMetadata.ContactId, CustomerMetadata.InputS3Uri, ConversationCharacteristics.ContactSummary.PostContactSummary.Content FROM call_summaries WHERE LOWER(ConversationCharacteristics.ContactSummary.PostContactSummary.Content) LIKE '%wibble%';

=>Join with Contact Records

Combine summary data with contact_record table:

SELECT cs.CustomerMetadata.ContactId, cs.CustomerMetadata.InputS3Uri, cs.ConversationCharacteristics.ContactSummary.PostContactSummary.Content, cr.* FROM call_summaries cs JOIN contact_record cr ON cs.CustomerMetadata.ContactId = cr.contactid WHERE LOWER(cs.ConversationCharacteristics.ContactSummary.PostContactSummary.Content) LIKE '%wibble%';

I hope the above helps. Thank You

AWS
answered a month ago
AWS
EXPERT
reviewed a month ago
  • Creating the table using the SQL provided did not work for me - sure the table was created, however it was empty (0 records) even after running the repair command and even though there were valid .json files in the call recordings S3 bucket. I have had similar issues using SQL of my own of

    CREATE EXTERNAL TABLE sata_analysis( post_contact_summary_content string COMMENT 'from deserializer', contact_id string COMMENT 'from deserializer', input_s3_uri string COMMENT 'from deserializer', instance_id string COMMENT 'from deserializer', matched_categories array<string> COMMENT 'from deserializer') PARTITIONED BY ( year string COMMENT '', month string COMMENT '', day string COMMENT '') ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://S3BUCKETNAME/Analysis/Voice/' TBLPROPERTIES ( 'classification'='json')

    and setting up a crawler as well. So any suggestions would be welcome.

0

You can use Amazon Athena to query the JSON files containing Amazon Connect Contact Lens analytics data stored in your S3 bucket. Here's how to approach this:

  1. Create an external table in Athena that maps to your JSON files in S3. The table should define the nested JSON structure including the fields you're interested in: ContactId, InputS3Uri, and the PostContactSummary Content.

  2. When creating the table, you'll need to:

  • Specify the JSON SerDe (serializer/deserializer)
  • Define the schema to match your JSON structure with nested fields
  • Point to the S3 location where your analysis files are stored
  • Set up partitioning based on your folder structure (year/month/day)
  1. Once the table is created, you can write SQL queries to extract and analyze the data. For example, to find contacts where "Wibble" was mentioned in the PostContactSummary:
SELECT 
CustomerMetadata.ContactId, 
CustomerMetadata.InputS3Uri,
ConversationCharacteristics.ContactSummary.PostContactSummary.Content
FROM your_table_name
WHERE LOWER(ConversationCharacteristics.ContactSummary.PostContactSummary.Content) LIKE '%wibble%'
  1. You can join this data with other Amazon Connect data lake tables using the ContactId field:
SELECT 
a.CustomerMetadata.ContactId,
a.ConversationCharacteristics.ContactSummary.PostContactSummary.Content,
b.other_fields
FROM your_analysis_table a
JOIN contact_record b ON a.CustomerMetadata.ContactId = b.contactid
WHERE LOWER(a.ConversationCharacteristics.ContactSummary.PostContactSummary.Content) LIKE '%wibble%'

This approach allows you to query the nested JSON structure directly and extract the specific fields you need, then join them with other Connect data for comprehensive analysis.
Sources
Pull historical AWS Connect (CTR) Contract Trace Records | AWS re:Post
Query JSON data - Amazon Athena
Output file locations for files analyzed by Contact Lens conversational analytics - Amazon Connect

answered a month ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.