Update README.md
Browse files
README.md
CHANGED
@@ -20,8 +20,12 @@ padding: 0;
|
|
20 |
|
21 |
The NeMo Audio Codec is a neural audio codec which compresses audio into a quantized representation. The model can be used as a vocoder for speech synthesis. The model works with full-bandwidth 22.05kHz speech. It might have lower performance with low-bandwidth speech (e.g. 16kHz speech upsampled to 22.05kHz) or with non-speech audio.
|
22 |
|
|
|
|
|
|
|
|
|
23 |
## Model Architecture
|
24 |
-
The NeMo Audio Codec model uses symmetric convolutional encoder-decoder architecture based on [HiFi-GAN](https://arxiv.org/abs/2010.05646). We use [Finite Scalar Quantization (FSQ)](https://arxiv.org/abs/2309.15505), with
|
25 |
|
26 |
For more details please refer to [our paper](https://arxiv.org/abs/2406.05298).
|
27 |
|
|
|
20 |
|
21 |
The NeMo Audio Codec is a neural audio codec which compresses audio into a quantized representation. The model can be used as a vocoder for speech synthesis. The model works with full-bandwidth 22.05kHz speech. It might have lower performance with low-bandwidth speech (e.g. 16kHz speech upsampled to 22.05kHz) or with non-speech audio.
|
22 |
|
23 |
+
| Sample Rate | Frame Rate | Bit Rate | # Codebooks | Codebook Size | Embed Dim | FSQ Levels |
|
24 |
+
|:-----------:|:----------:|:----------:|:-----------:|:-------------:|:-----------:|:------------:|
|
25 |
+
| 22050 | 86.1 | 6.9kpbs | 8 | 1000 | 32 | [8, 5, 5, 5] |
|
26 |
+
|
27 |
## Model Architecture
|
28 |
+
The NeMo Audio Codec model uses symmetric convolutional encoder-decoder architecture based on [HiFi-GAN](https://arxiv.org/abs/2010.05646). We use [Finite Scalar Quantization (FSQ)](https://arxiv.org/abs/2309.15505), with 8 codebooks and 1000 entries per codebook.
|
29 |
|
30 |
For more details please refer to [our paper](https://arxiv.org/abs/2406.05298).
|
31 |
|