Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
|
| 2 |
|
| 3 |
JAM is a rectified flow-based model for lyrics-to-song generation that addresses the lack of fine-grained word-level controllability in existing lyrics-to-song models. Built on a compact 530M-parameter architecture with 16 LLaMA-style Transformer layers as the Diffusion Transformer (DiT) backbone, JAM enables precise vocal control that musicians desire in their workflows. Unlike previous models, JAM provides word and phoneme-level timing control, allowing musicians to specify the exact placement of each vocal sound for improved rhythmic flexibility and expressive timing.
|
|
@@ -267,4 +283,4 @@ For questions, concerns, or collaboration inquiries, please contact the Project
|
|
| 267 |
For issues and questions:
|
| 268 |
- Open an issue on GitHub
|
| 269 |
- Check the troubleshooting section above
|
| 270 |
-
- Review the configuration options for parameter tuning
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
metrics:
|
| 5 |
+
- PER
|
| 6 |
+
- WER
|
| 7 |
+
- SongEval
|
| 8 |
+
- Audio Aesthetics
|
| 9 |
+
- MuQ
|
| 10 |
+
- FAD
|
| 11 |
+
pipeline_tag: text-to-audio
|
| 12 |
+
library_name: diffusers
|
| 13 |
+
tags:
|
| 14 |
+
- music
|
| 15 |
+
- art
|
| 16 |
+
---
|
| 17 |
# JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
|
| 18 |
|
| 19 |
JAM is a rectified flow-based model for lyrics-to-song generation that addresses the lack of fine-grained word-level controllability in existing lyrics-to-song models. Built on a compact 530M-parameter architecture with 16 LLaMA-style Transformer layers as the Diffusion Transformer (DiT) backbone, JAM enables precise vocal control that musicians desire in their workflows. Unlike previous models, JAM provides word and phoneme-level timing control, allowing musicians to specify the exact placement of each vocal sound for improved rhythmic flexibility and expressive timing.
|
|
|
|
| 283 |
For issues and questions:
|
| 284 |
- Open an issue on GitHub
|
| 285 |
- Check the troubleshooting section above
|
| 286 |
+
- Review the configuration options for parameter tuning
|