File size: 6,957 Bytes
5d51dcb 073f7c2 5d51dcb 578de52 a53bcc2 eb9a137 a53bcc2 811e893 8dc82b1 5d51dcb dd8456a 5d51dcb e39035f 5d51dcb 5cb5cd4 e39035f 5cb5cd4 5d51dcb e39035f 5d51dcb c9b9bfb fa53553 f235cfe c9b9bfb 718d878 50f98b7 eedfddd fa2135c eedfddd e39035f 30c5c7e 5d51dcb e39035f 6cc3539 eedfddd e39035f 6cc3539 e39035f 6cc3539 e39035f 5d51dcb e39035f 5d51dcb e39035f 5d51dcb beabf08 5d51dcb beabf08 5d51dcb e39035f 5d51dcb e39035f 5d51dcb beabf08 5d51dcb e39035f 5d51dcb e39035f 5d51dcb e39035f 5d51dcb e39035f 5d51dcb e39035f 5d51dcb e39035f 5d51dcb e39035f 5d51dcb e39035f 5d51dcb e39035f 5d51dcb e39035f 5d51dcb e39035f 5d51dcb e39035f 5d51dcb 6cc3539 e39035f 6cc3539 eedfddd 6cc3539 a210f6a dd8456a 6cc3539 a210f6a 6cc3539 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
---
license: apache-2.0
language:
- ar
tags:
- audio
- automatic-speech-recognition
---
<style>
img {
display: inline;
}
</style>

|
|[](https://github.com/linagora-labs/ASR_train_kaldi_tunisian?tab=readme-ov-file#acoustic-model-am)
|[](https://github.com/linagora-labs/ASR_train_kaldi_tunisian)
# LinTO ASR Arabic Tunisia v0.1
**LinTO ASR Arabic Tunisia v0.1** is an Automatic Speech Recognition (ASR) model for the Tunisian dialect,
with some capabilities of code-switching when some French or English words are used.
This repository includes two versions of the model and a Language model with ARPA format:
- `vosk-model`: The original, comprehensive model.
- `android-model`: A lighter version with a simplified graph, optimized for deployment on Android devices or Raspberry Pi applications.
- `lm_TN_CS.arpa.gz`: A language model trained using SRILM on a dataset containing 4.5 million lines of text collected from various sources.
## Model Overview
- **Model type**: Kaldi TDNN
- **Language(s)**: Tunisian Dialect
- **Use cases**: Automatic Speech Recognition (ASR)
### Model Performance
The following table summarizes the performance of the **LinTO ASR Arabic Tunisia v0.1** model on various considered **test sets**:
| Dataset | CER | WER |
| :------- | :------- | :------- |
| [Youtube_TNScrapped_V1](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `25.39%` | `37.51%` |
| [TunSwitchCS](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `17.72%` | `20.51%` |
| [TunSwitchTO](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `11.13%` | `22.54%` |
| [ApprendreLeTunisien](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table) | `11.81%` | `23.27%` |
| [TARIC](https://github.com/elyadata/TARIC-SLU) | `10.60%` | `16.06%` |
| [OneStory](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn#data-table)| `1.53%` | `4.47%` |
### Training code
The model was trained using the following GitHub repository: [ASR_train_kaldi_tunisian](https://github.com/linagora-labs/ASR_train_kaldi_tunisian)
### Training datasets
The model was trained using the following datasets:
- **[LinTO DataSet Audio for Arabic Tunisian](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn):** This dataset comprises a collection of Tunisian dialect audio recordings and their annotations for Speech-to-Text (STT) tasks. The data was collected from various sources, including Hugging Face, YouTube, and websites.
- **[LinTO DataSet Audio for Arabic Tunisian Augmented](https://huggingface.co/datasets/linagora/linto-dataset-audio-ar-tn-augmented):** This dataset is an augmented version of the LinTO DataSet Audio for Arabic Tunisian v0.1. The augmentation includes noise reduction and voice conversion.
- **[TARIC](https://github.com/elyadata/TARIC-SLU):** This dataset consists of Tunisian Arabic speech recordings collected from train stations in Tunisia.
## How to use
### 1. Download the model
You can download the model and its components directly from this repository using one of the following methods:
**Method 1: Direct Download via Browser**
1. **Visit the Repository**: Navigate to the [Hugging Face model page](https://huggingface.co/linagora/linto-asr-ar-tn-0.1).
2. **Download as Zip**: Click on the "Download" button or the "Code" button (often appearing as a dropdown). Select "Download ZIP" to get the entire repository as a zip file.
**Method 2: Using `curl` command**
You can follow the command below:
```bash
sudo apt-get install curl
curl -L https://huggingface.co/linagora/linto-asr-ar-tn-0.1/resolve/main/vosk-model.zip --output vosk-model.zip
```
(or same with `android-model.zip` instead of `vosk-model.zip`)
**Method 3: Cloning the Repository**
You can clone the repository and create a zip file of the contents if needed:
```bash
sudo apt-get install git-lfs
git lfs install
git clone https://huggingface.co/linagora/linto-asr-ar-tn-0.1.git
cd linto-asr-ar-tn-0.1
```
### 2. Unzip the model
This can be done in bash:
```bash
mkdir dir_for_zip_extract
unzip /path/to/model-name.zip -d dir_for_zip_extract
```
### 3. Python code
First, make sure to install the required dependencies:
```bash
pip install vosk
```
Then you can launch the inference script from this repository:
```bash
python inference.py <path/to/your/model> <path/to/your/audio/file.wav>
```
or use such a python code:
```python
from vosk import Model, KaldiRecognizer
import wave
import json
model_dir = "path/to/your/model"
audio_file = "path/to/your/audio/file.wav"
model = Model(model_dir)
with wave.open(audio_file, "rb") as wf:
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
raise ValueError("Audio file must be WAV format mono PCM.")
rec = KaldiRecognizer(model, wf.getframerate())
rec.AcceptWaveform(wf.readframes(wf.getnframes()))
res = rec.FinalResult()
transcript = json.loads(res)["text"]
print(f"Transcript: {transcript}")
```
## Example
Here is an example of the transcription capabilities of the model:
<audio controls>
<source src="https://huggingface.co/linagora/linto-asr-ar-tn-0.1/resolve/main/sample.wav" type="audio/wav">
</audio>
### Result:
<p dir="rtl">
بالدعم هاذايا لي بثتهولو ال berd يعني أحنا حتى ال projet متاعو تقلب حتى sur le plan حتى فال management يا سيد نحنا في تسيير الشريكة يعني تبدل مية و ثمانين درجة ماللي يعني قبل ما تجيه ال berd و بعد ما جاتو ال berd برنامج نخصص لل les startup إسمو
</p>
## WebRTC Demonstartion
Install required dependencies:
```bash
pip install vosk
pip install websockets
```
If not done, close the repostorory:
```bash
git clone https://huggingface.co/linagora/linto-asr-ar-tn-0.1.git
```
Then call the `app.py` script:
```bash
cd linto-asr-ar-tn-0.1/Demo-WebRTC
python3 app.py <model-path>
```
Access the web interface at: `localhost:8010` Just start and speak.
Preview of the web app interface:

## Citation
```bibtex
@misc{linagora2024Linto-tn,
author = {Hedi Naouara and Jérôme Louradour and Jean-Pierre Lorré},
title = {LinTO Audio and Textual Datasets to Train and Evaluate Automatic Speech Recognition in Tunisian Arabic Dialect},
year = {2024},
month = {October},
note = {Good Data Workshop, AAAI 2025},
howpublished = {\url{https://huggingface.co/linagora/linto-asr-ar-tn-0.1}},
}
``` |