hertz-pj
/

snac-vocos

hertz-pj commited on Oct 28, 2024

Commit

5a02594

1 Parent(s): edc9cba

upload snac_vocos_16khz_hop200_scale8421_1kh

Files changed (2) hide show

README.md CHANGED Viewed

@@ -1,3 +1,42 @@
----
-license: mit
----

+# SNAC-Vocos
+A trainer for [SNAC](https://github.com/hubertsiuzdak/snac) (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.
+## Installation
+Suggested python>=3.9.
+Clone the repository:
+```
+git clone https://github.com/hertz-pj/SNAC-Vocos
+cd SNAC-Vocos
+```
+Install packages:
+```
+pip install -r requirements.txt
+```
+## Infer
+Refer to the [infer.py](./infer.py) for inference instructions and usage examples.
+## Available Models
+| Model name | Huggingface | Corpus | Domain |
+|:------------|:--------|:--------|:--------|
+|snac_vocos_16khz_hop200_scale8421_1kh | [🤗](https://huggingface.co/hertz-pj/snac-vocos) | 1k hours | Speech(Mandarin/English) |
+## Training
+1、Prepare a filelist of audio files for the training and validation set, e.g. [train.list](./data/train.list).
+2、Fill a config file, e.g. [snac_vocos.yaml](./config/snac_vocos_nq4_scale8421_16khz.yaml). The main parameters to pay attention to are batch_size, filelist_path, save_dir, and device.
+3、Start training
+```
+python train.py fit --config ./configs/snac_vocos.yaml
+```
+## TODO
+- [x] Release code
+- [x] Release a checkpoint trained with 1k hours of speech(Mandarin/English).
+- [ ] Demo page.
+## Acknowledgements
+This implementation uses parts of the code from the following Github repos:
+- [SNAC](https://github.com/hubertsiuzdak/snac)
+- [WavTokenizer](https://github.com/jishengpeng/WavTokenizer/)

snac_vocos_16khz_hop200_scale8421_1kh.ckpt ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:c8fa34b981f1d1f62a25801f86aeb041d5f548096ae4ec1c92761f749ed90d40
+size 1710208559