anzorq commited on
Commit
0a2b433
·
verified ·
1 Parent(s): 5b263b2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Circassian (Kabardian) ASR Model
2
+
3
+ This is a fine-tuned model for Automatic Speech Recognition (ASR) in `kbd`, based on the `facebook/w2v-bert-2.0` model. The model was trained on a combination of the `anzorq/kbd_speech` (filtered on `country=russia`) and `anzorq/sixuxar_yijiri_mak7` datasets.
4
+
5
+ ## Model Details
6
+
7
+ - **Base Model**: facebook/w2v-bert-2.0
8
+ - **Language**: Kabardian
9
+ - **Task**: Automatic Speech Recognition (ASR)
10
+ - **Datasets**: anzorq/kbd_speech, anzorq/sixuxar_yijiri_mak7
11
+ - **Training Steps**: 5000
12
+
13
+ ## Training
14
+
15
+ The model was fine-tuned using the following training arguments:
16
+
17
+ ```python
18
+ TrainingArguments(
19
+ output_dir='output',
20
+ group_by_length=True,
21
+ per_device_train_batch_size=8,
22
+ gradient_accumulation_steps=2,
23
+ evaluation_strategy="steps",
24
+ num_train_epochs=10,
25
+ gradient_checkpointing=True,
26
+ fp16=True,
27
+ save_steps=1000,
28
+ eval_steps=500,
29
+ logging_steps=300,
30
+ learning_rate=5e-5,
31
+ warmup_steps=500,
32
+ save_total_limit=2,
33
+ push_to_hub=True,
34
+ report_to="wandb"
35
+ )
36
+ ```
37
+
38
+ ## Performance
39
+
40
+ The model's performance during training:
41
+
42
+ | Step | Training Loss | Validation Loss | WER |
43
+ |------|---------------|-----------------|---------|
44
+ | 500 | 2.859600 | inf | 0.870362|
45
+ | 1000 | 0.355500 | inf | 0.703617|
46
+ | 1500 | 0.247100 | inf | 0.549942|
47
+ | 2000 | 0.196700 | inf | 0.471762|
48
+ | 2500 | 0.181500 | inf | 0.361494|
49
+ | 3000 | 0.152200 | inf | 0.314119|
50
+ | 3500 | 0.135700 | inf | 0.275146|
51
+ | 4000 | 0.113400 | inf | 0.252625|
52
+ | 4500 | 0.102900 | inf | 0.277013|
53
+ | 5000 | 0.078500 | inf | 0.250175|