Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ base_model: facebook/hf-seamless-m4t-medium
|
|
| 19 |
|
| 20 |
This is a **SeamlessLanguagePairs** model that processes audio and text inputs with both translation awareness and language pair embeddings to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment, taking into account both whether the subtitle is translated and the specific language pair involved.
|
| 21 |
|
| 22 |
-
The model extends the SeamlessM4T architecture with both translation features and language pair embeddings, providing the most granular control for multilingual
|
| 23 |
|
| 24 |
### Key Features
|
| 25 |
|
|
@@ -137,6 +137,14 @@ print(f"Predicted Time To Edit (TTE): {tte_prediction:.2f} seconds")
|
|
| 137 |
- **Output**: Single regression value (TTE in seconds)
|
| 138 |
- **Task**: Subtitle editing time prediction
|
| 139 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
## Data Format
|
| 141 |
|
| 142 |
Your input data should be a list of dictionaries with:
|
|
@@ -193,12 +201,12 @@ data = [
|
|
| 193 |
|
| 194 |
The model was trained with the following specifications:
|
| 195 |
|
| 196 |
-
- **Dataset**: Multimodal audio-subtitle pairs with translation and language pair annotations
|
| 197 |
- **Train/Test Split**: 80/20 with random seed 42
|
| 198 |
- **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
|
| 199 |
- **Text Processing**: Max 256 tokens
|
| 200 |
- **Translation Feature**: Binary flag indicating original vs translated content
|
| 201 |
-
- **Language Pairs**: 21
|
| 202 |
- **Normalization**: None (raw TTE values in seconds)
|
| 203 |
- **Caching**: Audio segments cached and compressed for efficiency
|
| 204 |
|
|
|
|
| 19 |
|
| 20 |
This is a **SeamlessLanguagePairs** model that processes audio and text inputs with both translation awareness and language pair embeddings to predict **Time To Edit (TTE)** for subtitle segments. Given an audio segment and its corresponding subtitle text, the model predicts how much time (in seconds) would be required to edit/refine that subtitle segment, taking into account both whether the subtitle is translated and the specific language pair involved.
|
| 21 |
|
| 22 |
+
The model extends the SeamlessM4T architecture with both translation features and language pair embeddings, providing the most granular control for multilingual scenarios across **5 languages: English, French, Spanish, Italian, and German** with **21 different translation pairs** between them (e.g., EN→FR, ES→DE, IT→EN, etc.).
|
| 23 |
|
| 24 |
### Key Features
|
| 25 |
|
|
|
|
| 137 |
- **Output**: Single regression value (TTE in seconds)
|
| 138 |
- **Task**: Subtitle editing time prediction
|
| 139 |
|
| 140 |
+
## Supported Language Pairs
|
| 141 |
+
|
| 142 |
+
The model supports 21 specific translation pairs between 5 languages:
|
| 143 |
+
|
| 144 |
+
**Languages**: English (EN), French (FR), Spanish (ES), Italian (IT), German (DE)
|
| 145 |
+
|
| 146 |
+
**Translation Pairs**: All combinations between the 5 languages create various directional pairs (e.g., EN→FR, FR→EN, ES→IT, DE→ES, etc.). The model uses language pair IDs (0-20) to identify specific translation directions, with ID 21 reserved for "other" pairs.
|
| 147 |
+
|
| 148 |
## Data Format
|
| 149 |
|
| 150 |
Your input data should be a list of dictionaries with:
|
|
|
|
| 201 |
|
| 202 |
The model was trained with the following specifications:
|
| 203 |
|
| 204 |
+
- **Dataset**: Multimodal audio-subtitle pairs with translation and language pair annotations (5 languages: EN, FR, ES, IT, DE with 21 pairs)
|
| 205 |
- **Train/Test Split**: 80/20 with random seed 42
|
| 206 |
- **Audio Processing**: 16kHz sampling, max 8.0 seconds, no offset
|
| 207 |
- **Text Processing**: Max 256 tokens
|
| 208 |
- **Translation Feature**: Binary flag indicating original vs translated content
|
| 209 |
+
- **Language Pairs**: 21 translation pairs from 5 languages (EN, FR, ES, IT, DE) plus "other" category
|
| 210 |
- **Normalization**: None (raw TTE values in seconds)
|
| 211 |
- **Caching**: Audio segments cached and compressed for efficiency
|
| 212 |
|