JoshuaW1997
/

FUTGA

JoshuaW1997 commited on Jul 29, 2024

Commit

e6ee21a

verified ·

1 Parent(s): 9049941

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -14,7 +14,11 @@ license: apache-2.0
 FUTGA is an audio LLM with fine-grained music understanding, learning from generative augmentation with temporal compositions. By leveraging existing music caption datasets and large language models (LLMs), we synthesize detailed music captions with structural descriptions and time boundaries for full-length songs. This synthetic dataset enables FUTGA to identify temporal changes at key transition points, their musical functions, and generate dense captions for full-length songs.
 <div align=center><img src="fig1.png" height="512px" width="512px"/></div>
-<!-- ![image](fig1.png) -->
 ## How to load the model
 We build FUTGA based on [**SALMONN**](https://huggingface.co/tsinghua-ee/SALMONN). Follow the instructions from [**SALMONN**](https://huggingface.co/tsinghua-ee/SALMONN) to load:

 FUTGA is an audio LLM with fine-grained music understanding, learning from generative augmentation with temporal compositions. By leveraging existing music caption datasets and large language models (LLMs), we synthesize detailed music captions with structural descriptions and time boundaries for full-length songs. This synthetic dataset enables FUTGA to identify temporal changes at key transition points, their musical functions, and generate dense captions for full-length songs.
 <div align=center><img src="fig1.png" height="512px" width="512px"/></div>
+## Comparing FUTGA dense captioning with MusicCaps/SongDescriber/LP-MusicCaps
+![image](demo.png)
 ## How to load the model
 We build FUTGA based on [**SALMONN**](https://huggingface.co/tsinghua-ee/SALMONN). Follow the instructions from [**SALMONN**](https://huggingface.co/tsinghua-ee/SALMONN) to load: