Text to Speech (TTS)
Converts text into natural-sounding speech with support for multiple languages, genders, and specific neural voices.
API Introduction
About this API
The Text to Speech (TTS) API utilizes advanced speech synthesis technology to convert any written text into natural, fluent human-sounding speech in real-time. This API is based on the powerful Microsoft Edge TTS engine, can be used without an API key, and offers a rich selection of languages and voices, allowing developers to generate high-quality audio content for their applications. Whether you want to add a "read aloud" function to your articles, develop a custom voice assistant, or provide accessibility for visually impaired users, this API offers a simple, efficient, and economical solution.
Key Features
- High-Quality Natural Speech: Employs leading neural network speech synthesis technology to generate speech with natural intonation and clear pronunciation, far surpassing traditional robotic voices.
- Extensive Language and Voice Support: Supports dozens of languages and regional dialects, with multiple male and female voices available for each language for developers to choose from, such as
en-US-JennyNeural
. - Real-time Streaming Generation: Can generate audio in a streaming fashion, which means it can be played as it's being generated, significantly reducing user waiting time, especially suitable for processing long texts.
- Easy Integration: Provides a simple RESTful interface. Just pass in parameters like text, language, and voice to get an audio file URL or audio data stream, making the integration process very intuitive.
- No API Key Required: Based on the free "Read Aloud" feature of the Edge browser, it can be called for free and without limits in many use cases, greatly reducing development costs.
Use Cases
Scenario 1: Add an Audio Reading Function to a News or Blog Application
Situation: A content platform wants to allow users to "listen" to articles on its website while driving, exercising, or doing housework.
Implementation: A "Play" button is added to each article page. When the user clicks the button, the front end sends the plain text content of the article to the backend. The backend calls the TTS API, passing the article text, selected language (e.g., en-US
), and voice (e.g., en-US-AvaNeural
) as parameters. The API returns a JSON response containing the URL of the generated MP3 file. The front end then plays this URL, allowing the user to listen to the article like a podcast. This not only increases the user's use cases but also improves the application's accessibility.
Scenario 2: Create Personalized Voice Notifications and Reminders
Situation: A smart home or personal assistant application needs to announce reminders or important notifications to the user by voice. Implementation: When a preset reminder (e.g., "Meeting with John at 3 PM") reaches its trigger time, the application's backend service dynamically generates a piece of text, such as "Reminder, you have a meeting with John at 3 PM." Then, the backend calls the TTS API to convert this text into speech. If a smart speaker is connected at home, the application can play this voice through the speaker. This personalized voice reminder is more humane and effective than a simple beep.
Scenario 3: Develop a Language Learning or Accessible Reading Tool
Situation: A language learning application needs to provide learners with standard pronunciation of words and sentences. Or a tool designed for users with reading disabilities.
Implementation: In the language learning application, when a user clicks on a new word or example sentence, the application calls the TTS API and specifies the voice of the target language (e.g., using ja-JP-KeitaNeural
when learning Japanese). The API generates the standard pronunciation of the word or sentence to help users follow and imitate. For an accessibility tool, it can convert any selected text from a webpage or document into speech, providing an important way for users with visual impairments or reading difficulties to obtain information.
How it Works: Endpoints & Response
This API works through a core generation endpoint that receives text and voice parameters and returns the generated audio information.
Endpoint Example: https://hub.juheapi.com/tts/v1/generate
The request body is very intuitive, containing text
(the text to be converted), lang
(language code), gender
(gender), and voice
(specific voice name). A successful API call will return a JSON object where the success
field indicates whether the operation was successful, and the url
field directly provides a link to the audio file for playback or download. The selected_voice
field confirms the voice that was ultimately used. This concise design makes the process of converting text to speech extremely simple.
Quick Actions
Pricing
API Explorer
Test API calls directly in your browser with our interactive explorer tool.