Spaces:
Sleeping
Sleeping
metadata
title: TTS API
emoji: π
colorFrom: green
colorTo: purple
sdk: docker
pinned: false
Text-to-Speech API π€
A public Text-to-Speech API built with FastAPI and Microsoft Edge TTS, optimized for Hugging Face Spaces deployment.
π Features
- Convert text to natural-sounding speech using Microsoft Edge TTS
- Multiple voice options with different languages and accents
- Customizable speech parameters (pitch and rate adjustment)
- RESTful API with automatic OpenAPI documentation
- Public access with CORS enabled
- Real-time audio generation and streaming
π API Documentation
Once deployed, visit the root URL to access the interactive API documentation (Swagger UI).
π§ API Endpoints
Core Endpoints
GET /- API information and documentation linksGET /health- Health check endpointGET /voices- List all available voicesPOST /synthesize- Convert text to speech (JSON)POST /synthesize-form- Convert text to speech (Form data)
Example Usage
Using cURL with JSON:
curl -X POST 'https://your-space-url/synthesize' \
-H 'Content-Type: application/json' \
-d '{
"text": "Hello from Hugging Face Spaces!",
"voice": "en-GB-SoniaNeural",
"pitch": "-10Hz",
"rate": "+15%"
}' \
--output speech.mp3
Using cURL with Form Data:
curl -X POST 'https://your-space-url/synthesize-form' \
-F 'text=Hello World!' \
-F 'voice=en-US-AriaNeural' \
-F 'pitch=+5Hz' \
-F 'rate=+10%' \
--output speech.mp3
Using Python requests:
import requests
response = requests.post(
'https://your-space-url/synthesize',
json={
'text': 'Hello from Python!',
'voice': 'en-US-AriaNeural',
'pitch': '+0Hz',
'rate': '+0%'
}
)
with open('speech.mp3', 'wb') as f:
f.write(response.content)
π Parameters
Request Parameters
| Parameter | Type | Default | Description | Example |
|---|---|---|---|---|
text |
string | required | Text to convert to speech | "Hello World!" |
voice |
string | "en-US-AriaNeural" | Voice identifier | "en-GB-SoniaNeural" |
pitch |
string | "+0Hz" | Pitch adjustment | "+10Hz", "-15Hz" |
rate |
string | "+0%" | Rate adjustment | "+20%", "-10%" |
Voice Examples
en-US-AriaNeural- US English, Femaleen-GB-SoniaNeural- UK English, Femaleen-AU-NatashaNeural- Australian English, Femalede-DE-KatjaNeural- German, Femalefr-FR-DeniseNeural- French, Femalees-ES-ElviraNeural- Spanish, Female
Use the /voices endpoint to get the complete list of available voices.
Parameter Ranges
- Pitch: -50Hz to +50Hz (e.g., "-25Hz", "+0Hz", "+30Hz")
- Rate: -50% to +50% (e.g., "-20%", "+0%", "+25%")
π οΈ Local Development
Installation
- Clone the repository
- Install dependencies:
pip install -r requirements.txt - Run the server:
python app.py - Open http://localhost:7860 for API documentation
Docker Deployment
# Build the image
docker build -t tts-api .
# Run the container
docker run -p 7860:7860 tts-api
π Hugging Face Spaces Deployment
- Create a new Space on Hugging Face
- Choose "Docker" as the SDK
- Upload the following files:
app.py(main application)requirements.txt(dependencies)Dockerfile(container configuration)README.md(this file)
- Your API will be publicly accessible once deployed!
π Response Format
Successful Response
- Content-Type:
audio/mpeg - Body: MP3 audio file
Error Response
{
"detail": "Error description"
}
π Rate Limiting & Usage
This is a public API, but please use it responsibly:
- Maximum text length: 5,000 characters
- Recommended: Don't exceed 100 requests per minute
- For production use, consider implementing authentication
π Troubleshooting
Common Issues
- Voice not found: Use the
/voicesendpoint to check available voices - Invalid parameters: Check pitch/rate format (must include Hz/% suffix)
- Text too long: Maximum 5,000 characters per request
- Network timeout: Large texts may take longer to process
π License
This project uses Microsoft Edge TTS service. Please review Microsoft's terms of service for usage guidelines.
π€ Contributing
Feel free to open issues or submit pull requests to improve this API!
Made with β€οΈ for the Hugging Face community