mediaConnect-support-playbook
The purpose of this document is to offer general guidance on how to troubleshoot the issues related to mediaConnect and troubleshooting
MediaConnect – Support Checklist
- Complete description of the issue
- Timeframe of the issue (whether it happened for specific timeframe or currently ongoing)
- Frequency of occurrence (whether it has happened before or first time, intermittent or a single continued occurrence, etc.)
- Complete flow architecture (like Src → Zixi → MediaConnect → MediaLive → MediaPackage → ..)
- ARNs of all the resources involved (MediaConnect flow ARN, MediaLive/MediaPackage channel ARN, etc.)
- Is this a new or existing workflow?
- Confirmation on source health (whether you have checked if the source had any issues during the concerned time)
- Investigation performed on your end (if any)
MediaConnect – Common Issue Investigation and Case Creation
MediaConnect flow maintenance schedule
Description(s) of the issue:
- Request to delay or reschedule upcoming scheduled MediaConnect flow maintenance
Why does MediaConnect service undergo maintenance
- MediaConnect flows undergo routine maintenance for service updates
- This involves stopping and restarting the flow, impacting traffic
- Maintenance windows are scheduled by MediaConnect and notified in advance - Times can vary, so monitor schedules via CloudWatch and Personal Health Dashboard (PHD)
-
You can also use CloudWatch metrics and EventBridge notifications to monitor your maintenance schedules
- At any point, if a maintenance schedule coincides with a live event, please reach out to AWS Support well in advance to get the schedule shifted or skipped for the duration of the event.
Data required from your end to delay the maintenance:
- Original maintenance date/time from notification
- Date, time and duration of the conflicting live event
- Proposed new maintenance date/time that avoids the live event
Monitoring for Schedule Changes
-
Check notification sources like email or Personal Health (Dashboard)
-
Continue monitoring CloudWatch maintenance metrics
-
Confirm adjusted schedule in MediaConnect console
MediaConnect and Downstream data flow issue
Description(s) of the issue:
- MediaLive channel input from MediaConnect goes blank/black for a period of time
- MediaConnect stops delivering content to the MediaLive channel
- Outage is observed between MediaConnect and the MediaLive channel
Troubleshooting steps to be performed
-
Inspect the MediaLive Channel
- Verify the channel configuration and pipeline status
- Confirm normal operation aside from the missing MediaConnect input
- Rule out issues upstream of the MediaConnect source in the MediaLive workflow
-
Inspect Critical source metrics
- Check SourceContinuityCounter - non-zero indicates source/network issues
- Monitor SourceBitRate for drops from expected rate
- Watch SourceNotRecoveredPackets for sustained non-zero values
- Trend SourceRecoveredPackets for early warnings of packet loss
-
Monitor the MediaConnect Source
-
Analyse Source and Flow Metrics
- Compare source health data like packets and throughput over time
- Look for anomalies in metrics that could indicate problems
- Assess flow health metrics for errors, delays or dropped frames
- Check output health metrics downstream of the source if issues persist
- Inspect media and encoding/decoding metrics for problems
- Review gateway health metrics when using Availability Zones
-
We also have a GitHub Sample Solution named “MediaConnect Easy Dashboard Maker” which will quickly make a CloudWatch dashboard for one, tagged, or all MediaConnect flows located in your current AWS Account and Region - Github
Data required for a support case:
- Same as the data mentioned in MediaConnect – Support Checklist
Source transport protocol (SRT, Zixi, RTP-FEC, etc.) and MediaConnect data flow issue
Issue Description
- SRT/Zixi source to MediaConnect flow intermittently breaking or failing
Troubleshooting steps
- Step 1: Check Critical Source Metrics - Monitor the 4 key source metrics already listed in the MediaConnect à Downstream data flow issue scenario.
- Step 2: Monitor Protocol-Specific Metrics - SRT - Zixi Push
- Step 3: Analyse Source Performance Issues - SRT and Zixi support error correction which helps pinpoint problems - RIST and RTP lack error correction - Monitoring protocol-specific metrics in addition to critical source stats helps isolate issues. Sources with error correction like SRT and Zixi are preferred over RIST/RTP
Data required to escalate to the support team:
The general advice here is to escalate to the Networking Support Team when packet drops are seen at a flow’s source as per the instructions in the article below:
Since EMX is only reporting packet drops/CC errors etc at the source there is a nearly 100% chance that any case raised where the customer sees this is caused by a network path issue upstream of the flow (either in the customer’s own network or in the AWS network) which is out of EMX support control and visibility to diagnose. mediaconnect-network-troubleshooting
-
MediaConnect state change event notifications
-
Description(s) of the issue:
- Configuring logging and notifications for MediaConnect flow alerts
-
Monitoring Options in MediaConnect
- MediaConnect does not provide advanced customer logging
- However, CloudWatch Events can be used to trigger notifications
-
-
Setting Up Notifications - MediaConnect can generate events for state changes, alerts etc - These events can be used to trigger other services. Some of them listed below :
- Invoke Lambda functions - Notify SNS topics or SQS queues - Trigger Step Functions state machines
-
However, we do have CloudWatch events that you can set up to get notified in case of any change in the state of your MediaConnect resources as well as process these notifications to perform further remedial actions.
-
You can trigger the following actions to get notified of these state changes and process them further: - Invoking an AWS Lambda function - Invoking Amazon EC2 Run Command - Relaying the event to Amazon Kinesis Data Streams - Activating an AWS Step Functions state machine - Notifying an Amazon SNS topic or an Amazon SQS queue
Some important notifications that could be configured
-
MediaConnect failover
-
Description(s) of the issue: - Understanding and managing source failovers in MediaConnect flows
-
How failovers work - MediaConnect randomly uses one of the sources to provide content for the flow if no primary source is specified - The flow switches to the other source if the primary source does not send data for 500 milliseconds, and switches back to the primary source as soon as data returns. - However, in case both flows go down simultaneously or fluctuate intermittently considerably fast, at one point in time, it can cause a stream failure and complete loss of data at the downstream. This would be notified in CloudWatch critical alert events.
-
Best Practices - Maintain replication across sources to avoid dual failure - Failovers can cause brief outage risks if sources fluctuate
-
Monitoring Failovers - CloudWatch critical alerts notify of stream failures - Source Health metric "FailoverSwitches" can help in tracking switches
MediaConnect quotas and limit increases
- Description(s) of the issue: - Request to increase a MediaConnect quota limit
- Default Quota Limits - Number of flows per region can be increased after review - All other quotas like API limits are fixed
- Increasing Flow Quota Limit - Contact AWS Support - Provide detailed use case requiring more than 20 flows - Support will review and may approve a higher limit
- Handling API Limit Quotas - API request limits are "steady state" 5/min and "burst" 30 - These limits cannot be increased - Optimize workflows to avoid breaching limits - Implement exponential backoff for API retries
- If API limits are consistently exceeded
- Analyse workflows and API usage patterns
- Explore alternative architectures or optimizations
**Special mention to our MediaConnect SME's Naveen Kumar Jindal, Kartik Kapoor and Ruhisar Tikoo in putting this content together.
Relevant content
- asked 6 years agolg...
- Accepted Answerasked 3 years agolg...
- asked a year agolg...
- AWS OFFICIALUpdated 8 months ago
- AWS OFFICIALUpdated 2 years ago
- How do I turn on the EC2 serial console, SAC, and boot menu to troubleshoot my Windows EC2 instance?AWS OFFICIALUpdated 6 months ago