English
music
JoshuaW1997 commited on
Commit
7d1f94f
·
verified ·
1 Parent(s): e828e3f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -6,9 +6,20 @@ license: apache-2.0
6
 
7
  <div align=center><img src="futga.jpg" height="256px" width="256px"/></div>
8
 
 
 
 
 
9
  ## Overview
10
  FUTGA is an audio LLM with fine-grained music understanding, learning from generative augmentation with temporal compositions. By leveraging existing music caption datasets and large language models (LLMs), we synthesize detailed music captions with structural descriptions and time boundaries for full-length songs. This synthetic dataset enables FUTGA to identify temporal changes at key transition points, their musical functions, and generate dense captions for full-length songs.
11
 
 
 
12
 
13
-
 
 
 
 
 
14
 
 
6
 
7
  <div align=center><img src="futga.jpg" height="256px" width="256px"/></div>
8
 
9
+ ## News
10
+
11
+ - [07/28] We released [**model checkpoint**](https://huggingface.co/JoshuaW1997/FUTGA) and **training/inference code** based on [**SALMONN-7B**](https://huggingface.co/tsinghua-ee/SALMONN) backbone!
12
+
13
  ## Overview
14
  FUTGA is an audio LLM with fine-grained music understanding, learning from generative augmentation with temporal compositions. By leveraging existing music caption datasets and large language models (LLMs), we synthesize detailed music captions with structural descriptions and time boundaries for full-length songs. This synthetic dataset enables FUTGA to identify temporal changes at key transition points, their musical functions, and generate dense captions for full-length songs.
15
 
16
+ <!-- <div align=center><img src="fig1.png" height="512px" width="512px"/></div> -->
17
+ ![image](fig1.png)
18
 
19
+ ## How to load the model
20
+ We build FUTGA based on [**SALMONN**](https://huggingface.co/tsinghua-ee/SALMONN). Follow the instructions from [**SALMONN**](https://huggingface.co/tsinghua-ee/SALMONN) to load:
21
+ 1. [whisper large v2](https://huggingface.co/openai/whisper-large-v2/tree/main) to ```whisper_path```,
22
+ 2. [Fine-tuned BEATs_iter3+ (AS2M) (cpt2)](https://valle.blob.core.windows.net/share/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt?sv=2020-08-04&st=2023-03-01T07%3A51%3A05Z&se=2033-03-02T07%3A51%3A00Z&sr=c&sp=rl&sig=QJXmSJG9DbMKf48UDIU1MfzIro8HQOf3sqlNXiflY1I%3D) to `beats_path`
23
+ 3. [vicuna 7B v1.5](https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main) to ```vicuna_path```,
24
+ 4. [FUTGA-7b](https://huggingface.co/JoshuaW1997/FUTGA/blob/main/salomnn_7b.bin) to ```ckpt_path```.
25