How to capture voice of customer on Amazon Connect and make an AI respond (through Lambda)?

0

Hello! We are working on developing an agent-less contact center service by leveraging an external transcription API and the ChatGPT API.

We have some questions about the feasibility of our project. Here are the details:

  • We plan to utilize Amazon Connect for inbound phone calls.
  • Customers who call in will be able to interact with an AI using their voice.
  • The conversation flow will alternate between the AI and the customer (AI -> Customer -> AI -> Customer...).
  • Our intention is to use an external service for speech-to-text transcription accessible via an API. This service will require a binary speech input.
  • Subsequently, the transcribed text received from the API will be sent to the ChatGPT API, which will provide a response.
  • Finally, the text received from ChatGPT will be passed to a Text-to-Speech service (such as Amazon Polly), which will respond back to the customer.

We attempted to implement this using Amazon Connect but encountered some challenges:

  • Amazon Lex employs Amazon Transcribe in the background, but we prefer to use an external API for transcription. Hence, we don't plan to use Amazon Lex.
  • How can we orchestrate the conversation flow? In other words, the AI bot should wait for the customer to finish speaking before transcribing, passing the input to ChatGPT, and ultimately delivering the response verbally.
  • We watched a video (https://www.youtube.com/watch?v=kD57QUn5myc) demonstrating Amazon Connect powered by AI services. However, it seems that the results of the AI services, like speech translation, are not relayed back to the customer via voice. This raises the question of whether there is a way to send the results (in our case, the text generated by ChatGPT) to Amazon Connect for vocalization.

Is such a system possible? While it seems feasible with Lex: https://www.geekfeed.co.jp/wp-content/uploads/2023/04/connect_to_gpt.mp4 (apologies, the video is in Japanese), we're curious if it's achievable without Lex.

Thank you very much.

Tony
asked 9 months ago469 views
2 Answers
0

Thank you for your answer @dmacias!

or you're able to invoke an external transcription service that gets the voice delivered to them in real time

Indeed, it seems that this use case is possible through the use of Kinesis Video Streams.

I see... I also read the documentation and tried in many ways to make it work... But I guess this part is not even possible.

How can we orchestrate the conversation flow? In other words, the AI bot should wait for the customer to finish speaking before transcribing, passing the input to ChatGPT, and ultimately delivering the response verbally.

Thank you again for your insight! I will explore other ways to implement it!

Tony
answered 9 months ago
0

I would love to be proven wrong, but I don't see this being possible with how Connect works. How I've seen this done with other vendors is that the audio is forked at the SBC or you're able to invoke an external transcription service that gets the voice delivered to them in real time. So what you're trying to do would not be real time which defeats the purpose of having a phone call.

profile picture
dmacias
answered 9 months ago

You are not logged in. Log in to post an answer.

A good answer clearly answers the question and provides constructive feedback and encourages professional growth in the question asker.

Guidelines for Answering Questions