File size: 1,911 Bytes
73ed896
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
title: Sonisphere
emoji: 🐒
colorFrom: green
colorTo: gray
sdk: gradio
sdk_version: 5.20.0
app_file: app.py
pinned: false
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# Sonisphere Demo

This is a Hugging Face Spaces demo for [MMAudio](https://hkchengrex.com/MMAudio/), a powerful model for generating realistic audio for videos.

## πŸŽ₯ Features

- Upload any video and generate matching audio
- Control the generation with text prompts
- Adjust generation parameters like steps and guidance strength
- Process videos up to 30 seconds in length

## πŸš€ Usage

1. Upload a video or use one of the example videos
2. Enter a text prompt describing the desired audio
3. (Optional) Add a negative prompt to specify what you don't want
4. Adjust the generation parameters if needed
5. Click "Submit" and wait for the generation to complete

## βš™οΈ Parameters

- **Prompt**: Describe the audio you want to generate
- **Negative prompt**: Specify what you don't want in the audio (default: "music")
- **Seed**: Control randomness (-1 for random seed)
- **Number of steps**: More steps = better quality but slower (default: 25)
- **Guidance Strength**: Controls how closely the generation follows the prompt (default: 4.5)
- **Duration**: Length of the generated audio in seconds (default: 8)

## πŸ“ Notes

- Processing high-resolution videos (>384px on shorter side) takes longer and doesn't improve results
- The model works best with videos between 5-30 seconds
- Generation time depends on video length and number of steps

## πŸ”— Links

- [Project Page](https://hkchengrex.com/MMAudio/)
- [GitHub Repository](https://github.com/hkchengrex/MMAudio)
- [Paper](https://arxiv.org/abs/2401.09774)

## πŸ“œ License

This demo uses the MMAudio model which is released under the [MIT license](https://github.com/hkchengrex/MMAudio/blob/main/LICENSE).