nvidia
/

audio-codec-22khz

Model card Files Files and versions Community

rlangman commited on Dec 5, 2024

Commit

b0e8aa9

·

verified ·

1 Parent(s): 0eaf726

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -20,8 +20,12 @@ padding: 0;
 The NeMo Audio Codec is a neural audio codec which compresses audio into a quantized representation. The model can be used as a vocoder for speech synthesis. The model works with full-bandwidth 22.05kHz speech. It might have lower performance with low-bandwidth speech (e.g. 16kHz speech upsampled to 22.05kHz) or with non-speech audio.
 ## Model Architecture
-The NeMo Audio Codec model uses symmetric convolutional encoder-decoder architecture based on [HiFi-GAN](https://arxiv.org/abs/2010.05646). We use [Finite Scalar Quantization (FSQ)](https://arxiv.org/abs/2309.15505), with eight codebooks, 1000 entries per codebook, 86.1 frames per second, and a 6.9kbps bitrate.
 For more details please refer to [our paper](https://arxiv.org/abs/2406.05298).

 The NeMo Audio Codec is a neural audio codec which compresses audio into a quantized representation. The model can be used as a vocoder for speech synthesis. The model works with full-bandwidth 22.05kHz speech. It might have lower performance with low-bandwidth speech (e.g. 16kHz speech upsampled to 22.05kHz) or with non-speech audio.
+| Sample Rate | Frame Rate | Bit Rate   | # Codebooks | Codebook Size | Embed Dim   | FSQ Levels   |
+|:-----------:|:----------:|:----------:|:-----------:|:-------------:|:-----------:|:------------:|
+| 22050       | 86.1       | 6.9kpbs    | 8           | 1000          | 32          | [8, 5, 5, 5] |
 ## Model Architecture
+The NeMo Audio Codec model uses symmetric convolutional encoder-decoder architecture based on [HiFi-GAN](https://arxiv.org/abs/2010.05646). We use [Finite Scalar Quantization (FSQ)](https://arxiv.org/abs/2309.15505), with 8 codebooks and 1000 entries per codebook.
 For more details please refer to [our paper](https://arxiv.org/abs/2406.05298).