Text to Speech (TTS) API: Pricing, Documentation & Key Features

About this API

The Text to Speech (TTS) API utilizes advanced speech synthesis technology to convert any written text into natural, fluent human-sounding speech in real-time. This API is based on the powerful Microsoft Edge TTS engine, can be used without an API key, and offers a rich selection of languages and voices, allowing developers to generate high-quality audio content for their applications. Whether you want to add a "read aloud" function to your articles, develop a custom voice assistant, or provide accessibility for visually impaired users, this API offers a simple, efficient, and economical solution.

Key Features

High-Quality Natural Speech: Employs leading neural network speech synthesis technology to generate speech with natural intonation and clear pronunciation, far surpassing traditional robotic voices.
Extensive Language and Voice Support: Supports dozens of languages and regional dialects, with multiple male and female voices available for each language for developers to choose from, such as en-US-JennyNeural.
Real-time Streaming Generation: Can generate audio in a streaming fashion, which means it can be played as it's being generated, significantly reducing user waiting time, especially suitable for processing long texts.
Easy Integration: Provides a simple RESTful interface. Just pass in parameters like text, language, and voice to get an audio file URL or audio data stream, making the integration process very intuitive.
No API Key Required: Based on the free "Read Aloud" feature of the Edge browser, it can be called for free and without limits in many use cases, greatly reducing development costs.

Use Cases

Scenario 1: Add an Audio Reading Function to a News or Blog Application

Situation: A content platform wants to allow users to "listen" to articles on its website while driving, exercising, or doing housework. Implementation: A "Play" button is added to each article page. When the user clicks the button, the front end sends the plain text content of the article to the backend. The backend calls the TTS API, passing the article text, selected language (e.g., en-US), and voice (e.g., en-US-AvaNeural) as parameters. The API returns a JSON response containing the URL of the generated MP3 file. The front end then plays this URL, allowing the user to listen to the article like a podcast. This not only increases the user's use cases but also improves the application's accessibility.

Scenario 2: Create Personalized Voice Notifications and Reminders

Situation: A smart home or personal assistant application needs to announce reminders or important notifications to the user by voice. Implementation: When a preset reminder (e.g., "Meeting with John at 3 PM") reaches its trigger time, the application's backend service dynamically generates a piece of text, such as "Reminder, you have a meeting with John at 3 PM." Then, the backend calls the TTS API to convert this text into speech. If a smart speaker is connected at home, the application can play this voice through the speaker. This personalized voice reminder is more humane and effective than a simple beep.

Scenario 3: Develop a Language Learning or Accessible Reading Tool

Situation: A language learning application needs to provide learners with standard pronunciation of words and sentences. Or a tool designed for users with reading disabilities. Implementation: In the language learning application, when a user clicks on a new word or example sentence, the application calls the TTS API and specifies the voice of the target language (e.g., using ja-JP-KeitaNeural when learning Japanese). The API generates the standard pronunciation of the word or sentence to help users follow and imitate. For an accessibility tool, it can convert any selected text from a webpage or document into speech, providing an important way for users with visual impairments or reading difficulties to obtain information.

How it Works: Endpoints & Response

This API works through a core generation endpoint that receives text and voice parameters and returns the generated audio information.

Endpoint Example: https://hub.juheapi.com/tts/v1/generate

The request body is very intuitive, containing text (the text to be converted), lang (language code), gender (gender), and voice (specific voice name). A successful API call will return a JSON object where the success field indicates whether the operation was successful, and the url field directly provides a link to the audio file for playback or download. The selected_voice field confirms the voice that was ultimately used. This concise design makes the process of converting text to speech extremely simple.

Text to Speech (TTS)

API Introduction