hertz-pj commited on
Commit
5a02594
·
1 Parent(s): edc9cba

upload snac_vocos_16khz_hop200_scale8421_1kh

Browse files
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SNAC-Vocos
2
+ A trainer for [SNAC](https://github.com/hubertsiuzdak/snac) (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.
3
+
4
+ ## Installation
5
+ Suggested python>=3.9.
6
+ Clone the repository:
7
+ ```
8
+ git clone https://github.com/hertz-pj/SNAC-Vocos
9
+ cd SNAC-Vocos
10
+ ```
11
+ Install packages:
12
+ ```
13
+ pip install -r requirements.txt
14
+ ```
15
+ ## Infer
16
+ Refer to the [infer.py](./infer.py) for inference instructions and usage examples.
17
+
18
+ ## Available Models
19
+ | Model name | Huggingface | Corpus | Domain |
20
+ |:------------|:--------|:--------|:--------|
21
+ |snac_vocos_16khz_hop200_scale8421_1kh | [🤗](https://huggingface.co/hertz-pj/snac-vocos) | 1k hours | Speech(Mandarin/English) |
22
+
23
+
24
+ ## Training
25
+ 1、Prepare a filelist of audio files for the training and validation set, e.g. [train.list](./data/train.list).
26
+ 2、Fill a config file, e.g. [snac_vocos.yaml](./config/snac_vocos_nq4_scale8421_16khz.yaml). The main parameters to pay attention to are batch_size, filelist_path, save_dir, and device.
27
+ 3、Start training
28
+ ```
29
+ python train.py fit --config ./configs/snac_vocos.yaml
30
+ ```
31
+
32
+ ## TODO
33
+ - [x] Release code
34
+ - [x] Release a checkpoint trained with 1k hours of speech(Mandarin/English).
35
+ - [ ] Demo page.
36
+
37
+
38
+ ## Acknowledgements
39
+ This implementation uses parts of the code from the following Github repos:
40
+ - [SNAC](https://github.com/hubertsiuzdak/snac)
41
+ - [WavTokenizer](https://github.com/jishengpeng/WavTokenizer/)
42
+
snac_vocos_16khz_hop200_scale8421_1kh.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c8fa34b981f1d1f62a25801f86aeb041d5f548096ae4ec1c92761f749ed90d40
3
+ size 1710208559