AI Voice Chat Automation

This workflow is ideal for:

Developers looking to integrate voice chat functionalities into their applications.
Businesses that want to enhance customer support with automated voice responses.
Educators interested in creating interactive learning platforms using voice interactions.
Content Creators who aim to automate their audio content generation from text inputs.

This workflow addresses the challenge of creating an automated voice chat system that can:

Convert spoken language into text using OpenAI's Speech to Text API.
Maintain context throughout conversations to provide relevant responses.
Generate audio responses utilizing ElevenLabs, offering a variety of voices for a more engaging user experience.

Webhook Trigger: The workflow starts with a webhook that listens for incoming voice messages.
Speech to Text Conversion: The voice message is sent to OpenAI's Speech to Text node, which transcribes the audio into text.
Context Retrieval: The transcribed text is processed to retrieve the previous chat context using the Get Chat node.
Aggregation of Context: The context from previous messages is aggregated to maintain conversation history.
Language Model Processing: The Basic LLM Chain node utilizes the aggregated context and the current message to generate a response using the Google Gemini Chat Model.
Inserting Chat: The conversation is updated with the new user and AI messages using the Insert Chat node.
Generating Audio Response: The generated text response is sent to ElevenLabs to convert it into audio format.
Responding to Webhook: Finally, the audio response is sent back through the webhook to the user.