File size: 2,634 Bytes
fa6ebd5 7629350 5b6b0d0 094c59d 5b6b0d0 094c59d fa6ebd5 853f64f fa6ebd5 5b6b0d0 fa6ebd5 bb73c43 fa6ebd5 60b727f bb73c43 60b727f fa6ebd5 00d899d fa6ebd5 48105e7 d6c52a3 48105e7 fa6ebd5 e9cea5c 67ec9fa 7d774ce 46c9e46 60b727f 7d774ce fa6ebd5 3c89fbd fa6ebd5 9d20985 2bc8cff ddca3fa fa6ebd5 ddca3fa fa6ebd5 4ea4ae1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
---
base_model: facebook/w2v-bert-2.0
library_name: transformers
language:
- uk
license: "apache-2.0"
task_categories:
- automatic-speech-recognition
tags:
- audio
datasets:
- Yehor/openstt-uk
metrics:
- wer
model-index:
- name: w2v-bert-uk-v2.1
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: common_voice_10_0
type: common_voice_10_0
config: uk
split: test
args: uk
metrics:
- name: WER
type: wer
value: 17.34
- name: CER
type: cer
value: 3.33
---
# w2v-bert-uk `v2.1`
## Community
- **Discord**: https://bit.ly/discord-uds
- Speech Recognition: https://t.me/speech_recognition_uk
- Speech Synthesis: https://t.me/speech_synthesis_uk
See other Ukrainian models: https://github.com/egorsmkv/speech-recognition-uk
## Overview
This is a next model of https://huggingface.co/Yehor/w2v-bert-uk
## Metrics
- AM (F16):
- WER: 0.1734 metric, 17.34%
- CER: 0.0333 metric, 3.33%
- Accuracy on words: 82.66%
- Accuracy on chars: 96.67%
## Demo
Use https://huggingface.co/spaces/Yehor/w2v-bert-uk-v2.1-demo space to see how the model works with your audios.
## Usage
```python
# pip install -U torch soundfile transformers
import torch
import soundfile as sf
from transformers import AutoModelForCTC, Wav2Vec2BertProcessor
# Config
model_name = 'Yehor/w2v-bert-uk-v2.1'
device = 'cuda:0' # or cpu
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
sampling_rate = 16_000
# Load the model
asr_model = AutoModelForCTC.from_pretrained(model_name, torch_dtype=torch_dtype).to(device)
processor = Wav2Vec2BertProcessor.from_pretrained(model_name)
paths = [
'sample1.wav',
]
# Extract audio
audio_inputs = []
for path in paths:
audio_input, _ = sf.read(path)
audio_inputs.append(audio_input)
# Transcribe the audio
inputs = processor(audio_inputs, sampling_rate=sampling_rate).input_features
features = torch.tensor(inputs).to(device)
with torch.inference_mode():
logits = asr_model(features).logits
predicted_ids = torch.argmax(logits, dim=-1)
predictions = processor.batch_decode(predicted_ids)
# Log results
print('Predictions:')
print(predictions)
```
## Cite this work
```
@misc {smoliakov_2025,
author = { {Smoliakov} },
title = { w2v-bert-uk-v2.1 (Revision 094c59d) },
year = 2025,
url = { https://huggingface.co/Yehor/w2v-bert-uk-v2.1 },
doi = { 10.57967/hf/4554 },
publisher = { Hugging Face }
}
```
|