Spaces:
Sleeping
Sleeping
| title: TTS API | |
| emoji: π | |
| colorFrom: green | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| # Text-to-Speech API π€ | |
| A public Text-to-Speech API built with FastAPI and Microsoft Edge TTS, optimized for Hugging Face Spaces deployment. | |
| ## π Features | |
| - **Convert text to natural-sounding speech** using Microsoft Edge TTS | |
| - **Multiple voice options** with different languages and accents | |
| - **Customizable speech parameters** (pitch and rate adjustment) | |
| - **RESTful API** with automatic OpenAPI documentation | |
| - **Public access** with CORS enabled | |
| - **Real-time audio generation** and streaming | |
| ## π API Documentation | |
| Once deployed, visit the root URL to access the interactive API documentation (Swagger UI). | |
| ## π§ API Endpoints | |
| ### Core Endpoints | |
| - `GET /` - API information and documentation links | |
| - `GET /health` - Health check endpoint | |
| - `GET /voices` - List all available voices | |
| - `POST /synthesize` - Convert text to speech (JSON) | |
| - `POST /synthesize-form` - Convert text to speech (Form data) | |
| ### Example Usage | |
| #### Using cURL with JSON: | |
| ```bash | |
| curl -X POST 'https://your-space-url/synthesize' \ | |
| -H 'Content-Type: application/json' \ | |
| -d '{ | |
| "text": "Hello from Hugging Face Spaces!", | |
| "voice": "en-GB-SoniaNeural", | |
| "pitch": "-10Hz", | |
| "rate": "+15%" | |
| }' \ | |
| --output speech.mp3 | |
| ``` | |
| #### Using cURL with Form Data: | |
| ```bash | |
| curl -X POST 'https://your-space-url/synthesize-form' \ | |
| -F 'text=Hello World!' \ | |
| -F 'voice=en-US-AriaNeural' \ | |
| -F 'pitch=+5Hz' \ | |
| -F 'rate=+10%' \ | |
| --output speech.mp3 | |
| ``` | |
| #### Using Python requests: | |
| ```python | |
| import requests | |
| response = requests.post( | |
| 'https://your-space-url/synthesize', | |
| json={ | |
| 'text': 'Hello from Python!', | |
| 'voice': 'en-US-AriaNeural', | |
| 'pitch': '+0Hz', | |
| 'rate': '+0%' | |
| } | |
| ) | |
| with open('speech.mp3', 'wb') as f: | |
| f.write(response.content) | |
| ``` | |
| ## π Parameters | |
| ### Request Parameters | |
| | Parameter | Type | Default | Description | Example | | |
| |-----------|------|---------|-------------|---------| | |
| | `text` | string | required | Text to convert to speech | "Hello World!" | | |
| | `voice` | string | "en-US-AriaNeural" | Voice identifier | "en-GB-SoniaNeural" | | |
| | `pitch` | string | "+0Hz" | Pitch adjustment | "+10Hz", "-15Hz" | | |
| | `rate` | string | "+0%" | Rate adjustment | "+20%", "-10%" | | |
| ### Voice Examples | |
| - `en-US-AriaNeural` - US English, Female | |
| - `en-GB-SoniaNeural` - UK English, Female | |
| - `en-AU-NatashaNeural` - Australian English, Female | |
| - `de-DE-KatjaNeural` - German, Female | |
| - `fr-FR-DeniseNeural` - French, Female | |
| - `es-ES-ElviraNeural` - Spanish, Female | |
| *Use the `/voices` endpoint to get the complete list of available voices.* | |
| ### Parameter Ranges | |
| - **Pitch**: -50Hz to +50Hz (e.g., "-25Hz", "+0Hz", "+30Hz") | |
| - **Rate**: -50% to +50% (e.g., "-20%", "+0%", "+25%") | |
| ## π οΈ Local Development | |
| ### Installation | |
| 1. Clone the repository | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Run the server: | |
| ```bash | |
| python app.py | |
| ``` | |
| 4. Open http://localhost:7860 for API documentation | |
| ### Docker Deployment | |
| ```bash | |
| # Build the image | |
| docker build -t tts-api . | |
| # Run the container | |
| docker run -p 7860:7860 tts-api | |
| ``` | |
| ## π Hugging Face Spaces Deployment | |
| 1. Create a new Space on Hugging Face | |
| 2. Choose "Docker" as the SDK | |
| 3. Upload the following files: | |
| - `app.py` (main application) | |
| - `requirements.txt` (dependencies) | |
| - `Dockerfile` (container configuration) | |
| - `README.md` (this file) | |
| 4. Your API will be publicly accessible once deployed! | |
| ## π Response Format | |
| ### Successful Response | |
| - **Content-Type**: `audio/mpeg` | |
| - **Body**: MP3 audio file | |
| ### Error Response | |
| ```json | |
| { | |
| "detail": "Error description" | |
| } | |
| ``` | |
| ## π Rate Limiting & Usage | |
| This is a public API, but please use it responsibly: | |
| - Maximum text length: 5,000 characters | |
| - Recommended: Don't exceed 100 requests per minute | |
| - For production use, consider implementing authentication | |
| ## π Troubleshooting | |
| ### Common Issues | |
| 1. **Voice not found**: Use the `/voices` endpoint to check available voices | |
| 2. **Invalid parameters**: Check pitch/rate format (must include Hz/% suffix) | |
| 3. **Text too long**: Maximum 5,000 characters per request | |
| 4. **Network timeout**: Large texts may take longer to process | |
| ## π License | |
| This project uses Microsoft Edge TTS service. Please review Microsoft's terms of service for usage guidelines. | |
| ## π€ Contributing | |
| Feel free to open issues or submit pull requests to improve this API! | |
| --- | |
| **Made with β€οΈ for the Hugging Face community** |