NeMo
rlangman commited on
Commit
b0e8aa9
·
verified ·
1 Parent(s): 0eaf726

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -20,8 +20,12 @@ padding: 0;
20
 
21
  The NeMo Audio Codec is a neural audio codec which compresses audio into a quantized representation. The model can be used as a vocoder for speech synthesis. The model works with full-bandwidth 22.05kHz speech. It might have lower performance with low-bandwidth speech (e.g. 16kHz speech upsampled to 22.05kHz) or with non-speech audio.
22
 
 
 
 
 
23
  ## Model Architecture
24
- The NeMo Audio Codec model uses symmetric convolutional encoder-decoder architecture based on [HiFi-GAN](https://arxiv.org/abs/2010.05646). We use [Finite Scalar Quantization (FSQ)](https://arxiv.org/abs/2309.15505), with eight codebooks, 1000 entries per codebook, 86.1 frames per second, and a 6.9kbps bitrate.
25
 
26
  For more details please refer to [our paper](https://arxiv.org/abs/2406.05298).
27
 
 
20
 
21
  The NeMo Audio Codec is a neural audio codec which compresses audio into a quantized representation. The model can be used as a vocoder for speech synthesis. The model works with full-bandwidth 22.05kHz speech. It might have lower performance with low-bandwidth speech (e.g. 16kHz speech upsampled to 22.05kHz) or with non-speech audio.
22
 
23
+ | Sample Rate | Frame Rate | Bit Rate | # Codebooks | Codebook Size | Embed Dim | FSQ Levels |
24
+ |:-----------:|:----------:|:----------:|:-----------:|:-------------:|:-----------:|:------------:|
25
+ | 22050 | 86.1 | 6.9kpbs | 8 | 1000 | 32 | [8, 5, 5, 5] |
26
+
27
  ## Model Architecture
28
+ The NeMo Audio Codec model uses symmetric convolutional encoder-decoder architecture based on [HiFi-GAN](https://arxiv.org/abs/2010.05646). We use [Finite Scalar Quantization (FSQ)](https://arxiv.org/abs/2309.15505), with 8 codebooks and 1000 entries per codebook.
29
 
30
  For more details please refer to [our paper](https://arxiv.org/abs/2406.05298).
31