deepl-project
/

conformer-finetunning

wav2vec2-conformer

Model card Files Files and versions Community

conformer-finetunning / README.md

Kiyoshi20's picture

Update README.md (#2)

309b041 verified 22 days ago

|

history blame contribute delete

1.93 kB

	---
	license: apache-2.0
	language:
	- en
	metrics:
	- wer
	base_model:
	- facebook/wav2vec2-base-960h
	tags:
	- pytorch
	- Transformers
	- speech
	- audio
	---

	# Model Description

	This model is a fine-tuned version of facebook/wav2vec2-base-960h for automatic speech recognition (ASR).
	It has been trained using the [LibriSpeech dataset](https://paperswithcode.com/dataset/librispeech) and is designed to improve transcription accuracy over the base model.

	The fine-tuning process involved:

	- Selecting a subset of speakers from the `dev-clean` and `test-clean` datasets.
	- Preprocessing audio files and their corresponding transcriptions.
	- Training with gradient accumulation, mixed precision (if available), and periodic evaluation.
	- Saving the fine-tuned model for inference.

	[GitHub](https://github.com/LucasTramonte/SpeechRecognition)
	Authors: Lucas Tramonte, Kiyoshi Araki

	# Usage

	To transcribe audio files, the model can be used as follows:

	```python
	from transformers import AutoProcessor, AutoModelForCTC
	import torch
	import librosa

	# Load model and processor
	processor = AutoProcessor.from_pretrained("deepl-project/conformer-finetunning")
	model = AutoModelForCTC.from_pretrained("deepl-project/conformer-finetunning")

	# Load and preprocess an audio file
	file_path = "path/to/audio/file.wav"
	speech, sr = librosa.load(file_path, sr=16000)
	inputs = processor(speech, sampling_rate=sr, return_tensors="pt", padding=True)

	# Perform inference
	with torch.no_grad():
	logits = model(**inputs).logits

	# Decode transcription
	predicted_ids = torch.argmax(logits, dim=-1)
	transcription = processor.batch_decode(predicted_ids)

	print("Transcription:", transcription[0])
	```


	# References

	- [LibriSpeech Dataset](https://paperswithcode.com/dataset/librispeech)
	- [Conformer Model Paper](https://paperswithcode.com/paper/conformer-based-target-speaker-automatic)
	- [Whisper Model Paper](https://arxiv.org/abs/2212.04356)