Skip to content

Integrating an External Voicebot (PolyAI / Volubile) with Amazon Connect Without PSTN

0

Hello AWS Community,

I am planning to integrate an external voicebot (e.g., PolyAI or Volubile) with Amazon Connect, and I would like guidance on the best practices and architecture for a PSTN-free integration.

Here is my use case:

  • Amazon Connect receives incoming customer calls.
  • Instead of using Amazon Lex, the call should be handled by an external voicebot.
  • The voicebot should manage the conversation in real-time (speech-to-text, NLU, text-to-speech).
  • If the bot cannot resolve the issue, the call should be transferred back to an Amazon Connect agent with the conversation context preserved.
  • I prefer to avoid PSTN-based transfers due to latency, call quality, and the additional cost associated with PSTN.

My main questions:

  1. Is it possible to connect an external voicebot to Amazon Connect without using PSTN?
  2. If a custom development / integration is required, which AWS services are recommended for this architecture? (For example: Kinesis Video Streams, Lambda, Transcribe, Chime SDK, or others.)
  3. What is the recommended architecture for such a real-time integration, including handling of audio streaming, STT, bot processing, TTS, and transfer back to agents?
  4. Are there any production reference architectures or best practices for integrating an external voicebot with Amazon Connect in a fully digital / SIP-like setup?
  5. Is there an officially supported service for “external SIP integration” with Amazon Connect, or is PSTN still the primary method?

Any guidance, architecture examples, or references would be greatly appreciated.

Thank you in advance!

  • If my answer helped solve your problem, I would appreciate it if you click on “accepted answer”.

4 Answers
6
Accepted Answer

Integrating an external voicebot while avoiding PSTN is the "gold standard" for latency, cost, and audio fidelity. While the previous response of the re:Post Agent correctly identified Amazon Chime SDK Voice Connectors, the following additions are crucial for a production-ready architecture in my opinion.

1. The Core Architecture: SIP-Based Handover

Instead of streaming audio via Kinesis Video Streams (which is complex to manage), for providers like PolyAI or Volubile think about a SIP-based transfer.

Inbound Path: The call arrives at Amazon Connect.

Handover: You use the Amazon Connect External Voice Transfer Connector. This allows you to "transfer" the call to a SIP URI (e.g., sip:bot@provider.com) instead of a traditional E.164 phone number.

Return Path: If the bot needs to escalate to a human, it sends a SIP REFER or initiates a new SIP call back to a specific Amazon Connect entry point.

2. Context Preservation via SIP Headers (UUI)

To ensure the agent knows what the bot and customer discussed, you must pass metadata back and forth.

Outbound: Use Custom SIP Headers (prefixed with X-Amazon-Connect-*) or the standard User-to-User (UUI) header. You can set these attributes in your Amazon Connect Contact Flow before the transfer.

Inbound: When the bot transfers the call back to an agent, it must include the ContactID or a SessionID in the SIP headers. Amazon Connect can then use a Lambda function to retrieve the conversation transcript from the bot's API and display it on the Agent’s Desktop (CCP).

3. Key Services & Configuration

ComponentServiceFeature,Purpose
SignalingAmazon Chime SDK Voice ConnectorActs as the SIP gateway between Connect and the Bot
ConnectivityAWS Resource Access Manager (RAM)Required to share the Voice Connector with your Amazon Connect instance
LatencyAWS Direct Connect / BYOIPEnsures the "Media Path"" between the Bot's SBC and AWS stays on a private, high-speed backbone
OrchestrationContact Flow (Transfer to Queue)Use the "Transfer to phone number" block, but target the SIP URI associated with the Voice Connector
  • Orchestration Detail: In your table, for the Orchestration row, it’s worth noting that you use the "Transfer to phone number" block in the UI, but you input the SIP URI (e.g., sip:bot@provider.com) instead of a phone number. This is a common point of confusion for new users.

  • SIP REFER Support: Just a heads-up that while SIP REFER is the cleanest way to "hand back" a call, it requires the external bot's SBC to support it correctly. If they don't, they simply "hairpin" the call back to a Connect-monitored DID/SIP URI.

4. Critical "Pro-Tips" for Production

  • Avoid the Analytics Connector: The CONNECT_ANALYTICS_CONNECTOR (SIPREC) is primarily for "passive listening" (recording/transcription). For an interactive bot that controls call flow, you strictly need the Transfer Connector.
  • Codec Negotiation: Ensure your bot provider supports G.711 or Opus. Mismatched codecs can cause "dead air" or increased latency due to transcoding.
  • Regionality: Deploy your Voice Connector in the same AWS Region as your Connect Instance to minimize "tromboning" (unnecessary audio travel), which adds milliseconds of delay.
  • SIP REFER Support: While SIP REFER is the cleanest way to "hand back" a call, verify that your external bot's SBC supports it. If not, they may need to "hairpin" the call back to a Connect-monitored SIP URI.

5. Summary of the Logic Flow

Call Start: Customer calls Connect.

  • Attribute Set: Connect Flow sets Customer_ID.
  • SIP Invite: Connect sends SIP INVITE + X-Amazon-Connect-CustomerID to PolyAI.
  • Bot Interaction: Bot processes audio in real-time.
  • Escalation: Bot sends SIP REFER back to Connect with a Transfer_Reason.
  • Agent Delivery: Connect routes to a human agent; the agent's screen pops with the bot's summary.

see also:

EXPERT
answered 2 months ago
  • Thank you for the detailed explanation. The SIP-based architecture using Amazon Connect and Amazon Chime SDK Voice Connector to integrate an external voicebot (e.g., PolyAI or Volubile) looks very promising, especially to avoid PSTN transfers.

    To evaluate this architecture for production, I would like to better understand the cost implications.

    In this scenario, it seems the main AWS services involved would be:

    Amazon Connect (call handling)

    Amazon Chime SDK Voice Connector (SIP media/signaling)

    Possibly AWS Lambda to retrieve conversation context when the call is handed back to an agent.

    For a typical call, for example:

    1 inbound call

    3 minutes handled by the voicebot

    2 minutes with a human agent after escalation

    SIP transfer between Amazon Connect and the external bot via Voice Connector

    Could you provide an approximate breakdown of the AWS costs per minute or per call for this architecture?

    I am particularly interested in understanding:

    the cost per minute for Amazon Connect during the bot interaction

    the cost per minute for Chime Voice Connector media

    any additional cost components (Lambda, data transfer, etc.)

    This would help compare this architecture with PSTN-based transfers.

    Thank you!

3

By using the Voice Connector as a SIP gateway, the call remains on-net within the AWS backbone as much as possible, which is exactly how you avoid the "overhead of the public telephone network". So, regarding your questions/comment about the cost implications for your 5-minute scenario (3 min Bot / 2 min Agent): Compared to PSTN, you are essentially replacing high per-minute carrier tolls with much lower media processing and data transfer fees, as I understand it. This architecture is almost always significantly more cost-effective for high-volume centers.

EXPERT
answered 2 months ago
  • It’s certainly less expensive, that’s clear. But the key question is: how much cheaper in practice?

    For example, in a 5-minute call scenario (3 minutes handled by the bot and 2 minutes by an agent), could you provide an approximate cost breakdown based on the services involved (e.g., Chime SDK Voice Connector, media processing, data transfer, etc.)?

    Having a rough per-call estimate would help better understand the real cost difference compared to a PSTN-based approach.

3

Hi Ragheb, I took some time and tried to answer your question from your last comment, or at least work out a response. I think that to do it properly, you really need to create your own calculation in https://calculator.aws/. While there isn't a single "SIP-Bot-Integration" template in the AWS Calculator yet, you can accurately model this by adding Amazon Connect (for total talk time) and Amazon Chime SDK Voice Connector (for the bot interaction time) as separate line items.

Accordingly, the following is provided without any warranty!

While exact pricing can vary slightly by AWS Region, let’s break down the approximate AWS infrastructure costs for your 5-minute scenario (3 min Bot / 2 min Agent) using the SIP-based architecture we discussed.

The main advantage here is that by staying "on-net" via the Amazon Chime SDK Voice Connector, you eliminate traditional PSTN carrier surcharges.

Approximate Cost Breakdown (e.g., US-East-1 / Europe):

1. Amazon Connect Service Fee:

  • Basis: $0.018 per minute (standard voice usage).
  • Calculation: 5 minutes × $0.018 = $0.090

2. Amazon Chime SDK Voice Connector (Media Processing):

  • Basis: Approx. $0.0028 per minute for the duration the media flows through the connector to the bot.
  • Calculation: 3 minutes × $0.0028 = $0.0084

3. SIP Signaling & Data Transfer:

  • Signaling is generally bundled, and data transfer (audio bits) within the AWS backbone is negligible for a single call (typically falling under the Free Tier or costing less than $0.0001).
  • Calculation: ~$0.00

4. AWS Lambda (Context Retrieval):

  • Basis: One-time trigger to pull the transcript when the call returns to the agent.
  • Calculation: ~$0.00 (well within the Lambda Free Tier).

Total Estimated AWS Infrastructure Cost per Call: ~$0.0984

Key Comparison vs. PSTN:

  • PSTN Approach: In a PSTN-based transfer, you would typically pay for the "Inbound Leg" to Connect, the "Outbound Leg" to the bot provider, and potentially another "Inbound Leg" back. This can easily add $0.03 to $0.06 extra per call in carrier tolls alone.
  • SIP Approach: You effectively pay only for the Connect service time and a tiny fraction for the media gateway (Chime).
ComponentService / FeatureCalculation BasisEstimated Cost (USD)
Call HandlingAmazon Connect Service Fee5 min x 0.018 USD/min0.090 USD
SIP SignalingChime SDK Voice ConnectorFlat rate per session (Signaling)~0.00 USD
Media PathVoice Connector Media3 min Bot-time times x 0.0028 USD/min0.0084 USD
Data TransferAWS Data Transfer (Out)Negligible for < 1 GB/month0.000 USD
MetadataAWS Lambda1 invocation for context retrieval~0.00 USD
Total (per call)~0.098 USD

Important Note: These are strictly the AWS infrastructure fees. You would still have your separate licensing/usage agreement with PolyAI or Volubile for the AI services (STT/NLU/TTS) itself. For your technical team, I recommend checking out the AWS documentation on "External Voice Transfer Connectors" and the "Chime SDK Voice Connector" admin guide https://docs.aws.amazon.com/connect/latest/adminguide/what-is-amazon-connect.html for the specific SIP configuration.

I hope this provides the clarity you need for your production evaluation!

also fyi:

EXPERT
answered 2 months ago
2

Based on the available AWS documentation, here's guidance on integrating an external voicebot with Amazon Connect:

1. PSTN-Free Integration Options:

Yes, it is possible to integrate external voice systems with Amazon Connect without using PSTN. Amazon Connect supports integration with external voice systems through the Amazon Chime SDK Voice Connectors, which can facilitate SIP-based connections.

2. Recommended AWS Services:

For your architecture, consider these AWS services:

  • Amazon Chime SDK Voice Connectors: These connectors enable integration with external voice systems. There are two integration types available:

  • CONNECT_CALL_TRANSFER_CONNECTOR: Enables direct transfer of voice calls and metadata between Amazon Connect and other voice systems without using the public telephone network

  • CONNECT_ANALYTICS_CONNECTOR: Provides real-time and post-call analytics capabilities including call recordings and conversational analytics

  • Contact Lens Connectors: These allow integration with external voice systems using SIPREC (Session Initiation Protocol Recording) for real-time audio streaming and analytics

3. Architecture Considerations:

For real-time integration, you'll need to:

  • Create a Voice Connector or Contact Lens connector in the Amazon Connect console
  • Configure your external voicebot system to communicate with the connector using the fully qualified host name provided by AWS
  • Enable encryption for secure audio transmission
  • Set up proper network connectivity, considering whether to use Direct Connect for dedicated, low-latency connections versus routing over the public internet

4. Network and Quality Considerations:

For optimal call quality and low latency:

  • Consider using AWS Direct Connect to avoid latency-sensitive voice traffic competing with general internet traffic
  • Ensure a healthy end-to-end network connection from the external voicebot through to Amazon Connect
  • Use wired network connections where possible for stable, high-quality audio

5. Implementation Steps:

To set up the integration:

  • Create the appropriate connector (Voice Connector or Contact Lens connector) in the Amazon Connect console
  • Note the connector's fully qualified host name
  • Configure your external voicebot system to point to this connector
  • Refer to the Amazon Chime SDK resources page for SIPREC/NBR Configuration Guides specific to your external system

While these services provide the foundation for PSTN-free integration, the specific implementation details for real-time STT, NLU processing, TTS, and context preservation during agent transfers will depend on your external voicebot provider's capabilities and how they implement the SIP/SIPREC integration with the Amazon Chime SDK connectors.
Sources
CreateVoiceConnector - Amazon Chime SDK
Configure your external voice system for integration with Contact Lens - Amazon Connect
Create a Contact Lens connector to integrate with your external voice system - Amazon Connect
Technical considerations - AWS Prescriptive Guidance

answered 2 months ago
EXPERT
reviewed 2 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.