File size: 2,634 Bytes

fa6ebd5
 
7629350
5b6b0d0
 
 
094c59d
5b6b0d0
094c59d
 
fa6ebd5
853f64f
fa6ebd5
 
 
5b6b0d0
fa6ebd5
 
 
 
 
 
 
 
 
 
 
bb73c43
fa6ebd5
60b727f
bb73c43
 
60b727f
fa6ebd5
 
00d899d
fa6ebd5
 
48105e7
 
d6c52a3
48105e7
 
fa6ebd5
e9cea5c
 
67ec9fa
 
 
 
7d774ce
 
 
46c9e46
60b727f
 
7d774ce
 
 
fa6ebd5
 
3c89fbd
fa6ebd5
 
 
 
 
 
 
 
 
 
 
9d20985
2bc8cff
ddca3fa
fa6ebd5
 
 
ddca3fa
fa6ebd5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4ea4ae1

---
base_model: facebook/w2v-bert-2.0
library_name: transformers
language: 
  - uk
license: "apache-2.0"
task_categories:
- automatic-speech-recognition
tags:
- audio
datasets:
  - Yehor/openstt-uk
metrics:
  - wer
model-index:
  - name: w2v-bert-uk-v2.1
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice_10_0
          type: common_voice_10_0
          config: uk
          split: test
          args: uk
        metrics:
          - name: WER
            type: wer
            value: 17.34
          - name: CER
            type: cer
            value: 3.33
---

# w2v-bert-uk `v2.1`


## Community

- **Discord**: https://bit.ly/discord-uds
- Speech Recognition: https://t.me/speech_recognition_uk
- Speech Synthesis: https://t.me/speech_synthesis_uk

See other Ukrainian models: https://github.com/egorsmkv/speech-recognition-uk

## Overview

This is a next model of https://huggingface.co/Yehor/w2v-bert-uk


## Metrics

- AM (F16):
  - WER: 0.1734 metric, 17.34%
  - CER: 0.0333 metric, 3.33%
  - Accuracy on words: 82.66%
  - Accuracy on chars: 96.67%

## Demo

Use https://huggingface.co/spaces/Yehor/w2v-bert-uk-v2.1-demo space to see how the model works with your audios.

## Usage

```python
# pip install -U torch soundfile transformers

import torch
import soundfile as sf
from transformers import AutoModelForCTC, Wav2Vec2BertProcessor

# Config
model_name = 'Yehor/w2v-bert-uk-v2.1'
device = 'cuda:0' # or cpu
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
sampling_rate = 16_000

# Load the model
asr_model = AutoModelForCTC.from_pretrained(model_name, torch_dtype=torch_dtype).to(device)
processor = Wav2Vec2BertProcessor.from_pretrained(model_name)

paths = [
  'sample1.wav',
]

# Extract audio
audio_inputs = []
for path in paths:
  audio_input, _ = sf.read(path)
  audio_inputs.append(audio_input)

# Transcribe the audio
inputs = processor(audio_inputs, sampling_rate=sampling_rate).input_features
features = torch.tensor(inputs).to(device)

with torch.inference_mode():
  logits = asr_model(features).logits

predicted_ids = torch.argmax(logits, dim=-1)
predictions = processor.batch_decode(predicted_ids)

# Log results
print('Predictions:')
print(predictions)
```

## Cite this work

```
@misc {smoliakov_2025,
	author       = { {Smoliakov} },
	title        = { w2v-bert-uk-v2.1 (Revision 094c59d) },
	year         = 2025,
	url          = { https://huggingface.co/Yehor/w2v-bert-uk-v2.1 },
	doi          = { 10.57967/hf/4554 },
	publisher    = { Hugging Face }
}
```