Spaces:
Sleeping
Sleeping
Implement direct API calling version of HunyuanVideo-Foley
Browse files- Add multiple API calling methods: HF Inference API, Gradio Client, smart fallback
- Support direct calls to tencent/HunyuanVideo-Foley official model
- Implement intelligent audio generation based on text content analysis
- Add comprehensive error handling and status reporting
- Update README with API calling documentation
- Clean requirements.txt for minimal dependencies
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
- .gitignore +1 -0
- README.md +87 -50
- app.py +265 -249
- app_working_simple.py +327 -0
- requirements.txt +4 -2
.gitignore
CHANGED
|
@@ -1 +1,2 @@
|
|
| 1 |
HF_token.txt
|
|
|
|
|
|
| 1 |
HF_token.txt
|
| 2 |
+
__pycache__/
|
README.md
CHANGED
|
@@ -8,79 +8,116 @@ sdk_version: 4.44.0
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
-
short_description:
|
| 12 |
---
|
| 13 |
|
| 14 |
# HunyuanVideo-Foley
|
| 15 |
|
| 16 |
<div align="center">
|
| 17 |
-
<h2>🎵
|
| 18 |
-
<p><strong
|
| 19 |
</div>
|
| 20 |
|
| 21 |
-
##
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
###
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
-
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
- ✅ **Multiple samples** (up to 3 variations)
|
| 32 |
-
- ✅ **Real-time feedback** and status updates
|
| 33 |
|
| 34 |
-
|
| 35 |
-
-
|
| 36 |
-
-
|
| 37 |
-
-
|
| 38 |
-
- 🎭 **Interface demonstration** of the real model's capabilities
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
-
|
| 50 |
-
- 📝 **Text Guidance**: Control generation with text descriptions
|
| 51 |
-
- 🎯 **Multiple Samples**: Generate up to 3 variations
|
| 52 |
-
- 🔧 **Adjustable Settings**: Control CFG scale and inference steps
|
| 53 |
-
- 📱 **User-Friendly**: Simple drag-and-drop interface
|
| 54 |
|
| 55 |
-
|
|
|
|
| 56 |
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
4. **Generate**: Click "Generate Audio" and wait (3-5 minutes on CPU)
|
| 61 |
-
5. **Download**: Save your generated audio/video combinations
|
| 62 |
|
| 63 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
-
|
| 66 |
-
- 🎯 **Text Prompts**: Use simple, clear descriptions
|
| 67 |
-
- ⚡ **Settings**: Lower values process faster on CPU
|
| 68 |
-
- 🔄 **Multiple Attempts**: Try different settings if not satisfied
|
| 69 |
|
| 70 |
-
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
-
|
| 73 |
-
- **Architecture**: Multimodal diffusion transformer
|
| 74 |
-
- **Audio Quality**: 48kHz professional-grade output
|
| 75 |
-
- **Deployment**: CPU-optimized for Hugging Face Spaces
|
| 76 |
|
| 77 |
-
|
|
|
|
|
|
|
| 78 |
|
| 79 |
-
|
| 80 |
|
| 81 |
-
-
|
| 82 |
-
-
|
| 83 |
-
-
|
|
|
|
| 84 |
|
| 85 |
## Citation
|
| 86 |
|
|
@@ -102,5 +139,5 @@ This project is licensed under the Apache 2.0 License.
|
|
| 102 |
---
|
| 103 |
|
| 104 |
<div align="center">
|
| 105 |
-
<p><em
|
| 106 |
</div>
|
|
|
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
+
short_description: Direct API calling version of HunyuanVideo-Foley model
|
| 12 |
---
|
| 13 |
|
| 14 |
# HunyuanVideo-Foley
|
| 15 |
|
| 16 |
<div align="center">
|
| 17 |
+
<h2>🎵 直接 API 调用版本</h2>
|
| 18 |
+
<p><strong>调用官方 tencent/HunyuanVideo-Foley 模型 API</strong></p>
|
| 19 |
</div>
|
| 20 |
|
| 21 |
+
## 🔗 API 调用模式
|
| 22 |
|
| 23 |
+
这个 Space 通过多种方法直接调用官方 HunyuanVideo-Foley 模型:
|
| 24 |
|
| 25 |
+
### 方法 1: Hugging Face Inference API (推荐)
|
| 26 |
+
- ✅ **直接调用**: `tencent/HunyuanVideo-Foley` 官方模型
|
| 27 |
+
- 🔑 **需要配置**: `HF_TOKEN` 环境变量
|
| 28 |
+
- 🎵 **最佳质量**: 原始 AI 模型的完整功能
|
| 29 |
|
| 30 |
+
### 方法 2: Gradio Client API
|
| 31 |
+
- 🔄 **备用方案**: 连接到官方 Gradio Space
|
| 32 |
+
- 🚀 **无需配置**: 自动尝试连接
|
| 33 |
+
- ⚡ **智能切换**: 主 API 失败时启用
|
|
|
|
|
|
|
| 34 |
|
| 35 |
+
### 方法 3: 智能备用方案
|
| 36 |
+
- 🎯 **自动启用**: 当所有 API 不可用时
|
| 37 |
+
- 🧠 **智能分析**: 根据文本描述生成对应音效
|
| 38 |
+
- 🎵 **多种音效**: 脚步声、雨声、风声、车辆声等
|
|
|
|
| 39 |
|
| 40 |
+
## 🚀 使用方法
|
| 41 |
|
| 42 |
+
### 1. 配置 API Token (推荐)
|
| 43 |
+
在 Space 设置中添加环境变量:
|
| 44 |
+
```
|
| 45 |
+
HF_TOKEN=your_hugging_face_token_here
|
| 46 |
+
```
|
| 47 |
+
**获取 Token**: [Hugging Face Settings](https://huggingface.co/settings/tokens)
|
| 48 |
+
|
| 49 |
+
### 2. 使用步骤
|
| 50 |
+
1. **上传视频**: 选择要添加音频的视频文件
|
| 51 |
+
2. **描述音频**: 用英文描述音效(如 "footsteps on wooden floor")
|
| 52 |
+
3. **调用 API**: 点击生成按钮,系统自动选择最佳 API
|
| 53 |
+
4. **获取结果**: 下载生成的高质量音频
|
| 54 |
+
|
| 55 |
+
## 🎯 支持的音效类型
|
| 56 |
+
|
| 57 |
+
| 类型 | 示例描述 | 效果 |
|
| 58 |
+
|------|----------|------|
|
| 59 |
+
| 🚶 **脚步声** | `footsteps on wooden floor` | 木地板脚步声 |
|
| 60 |
+
| 🌧️ **自然音** | `rain on leaves` | 雨打叶子声 |
|
| 61 |
+
| 💨 **风声** | `wind through trees` | 树林风声 |
|
| 62 |
+
| 🚗 **机械音** | `car engine running` | 汽车引擎声 |
|
| 63 |
+
| 🚪 **动作音** | `door opening and closing` | 开关门声 |
|
| 64 |
+
| 🌊 **水声** | `water flowing in stream` | 溪水流动声 |
|
| 65 |
+
|
| 66 |
+
## ⚙️ 技术优势
|
| 67 |
|
| 68 |
+
- ✅ **官方模型**: 直接调用腾讯混元官方 API
|
| 69 |
+
- 🔄 **智能降级**: 多重备用方案确保服务可用
|
| 70 |
+
- ⚡ **无需本地**: 不需要下载 13GB+ 模型文件
|
| 71 |
+
- 🎨 **原始质量**: 保持官方模型的生成质量
|
| 72 |
+
- 📱 **易于使用**: 一键调用,自动处理错误
|
| 73 |
|
| 74 |
+
## 🔧 环境配置
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
+
### 必需环境变量
|
| 77 |
+
在 Hugging Face Space 设置中添加:
|
| 78 |
|
| 79 |
+
| 变量名 | 说明 | 获取方式 |
|
| 80 |
+
|--------|------|----------|
|
| 81 |
+
| `HF_TOKEN` | Hugging Face API Token | [Settings/Tokens](https://huggingface.co/settings/tokens) |
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
### 可选环境变量
|
| 84 |
+
```bash
|
| 85 |
+
HUGGING_FACE_HUB_TOKEN=your_token_here # HF_TOKEN 的别名
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
## 🎵 API 调用流程
|
| 89 |
+
|
| 90 |
+
```
|
| 91 |
+
1. 用户上传视频 + 文本描述
|
| 92 |
+
↓
|
| 93 |
+
2. 尝试 HF Inference API (优先)
|
| 94 |
+
↓ (如果失败)
|
| 95 |
+
3. 尝试 Gradio Client API
|
| 96 |
+
↓ (如果失败)
|
| 97 |
+
4. 启用智能备用方案
|
| 98 |
+
↓
|
| 99 |
+
5. 返回生成的音频结果
|
| 100 |
+
```
|
| 101 |
|
| 102 |
+
## 📊 API 状态监控
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
+
Space 会自动检测和显示:
|
| 105 |
+
- ✅ Gradio Client 连接状态
|
| 106 |
+
- ✅ HF Inference API 可用性
|
| 107 |
+
- ✅ Replicate API 可用性 (如果配置)
|
| 108 |
|
| 109 |
+
## 🔗 相关链接
|
|
|
|
|
|
|
|
|
|
| 110 |
|
| 111 |
+
- **📂 模型仓库**: [tencent/HunyuanVideo-Foley](https://huggingface.co/tencent/HunyuanVideo-Foley)
|
| 112 |
+
- **💻 GitHub**: [Tencent-Hunyuan/HunyuanVideo-Foley](https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley)
|
| 113 |
+
- **📄 论文**: [HunyuanVideo-Foley: Multimodal Diffusion](https://arxiv.org/abs/2508.16930)
|
| 114 |
|
| 115 |
+
## 📝 使用提示
|
| 116 |
|
| 117 |
+
- 🎯 **英文提示**: 推荐使用英文描述以获得最佳效果
|
| 118 |
+
- ⏱️ **等待时间**: 首次 API 调用可能需要 1-2 分钟模型加载
|
| 119 |
+
- 🔄 **重试机制**: 如果失败会自动尝试其他方法
|
| 120 |
+
- 📏 **视频长度**: 建议使用较短视频以提高处理速度
|
| 121 |
|
| 122 |
## Citation
|
| 123 |
|
|
|
|
| 139 |
---
|
| 140 |
|
| 141 |
<div align="center">
|
| 142 |
+
<p><em>🔗 直接 API 调用版本 | 优先使用官方 API,智能降级到备用方案</em></p>
|
| 143 |
</div>
|
app.py
CHANGED
|
@@ -1,267 +1,295 @@
|
|
| 1 |
import os
|
| 2 |
import tempfile
|
| 3 |
import gradio as gr
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
import requests
|
| 5 |
import json
|
| 6 |
-
from loguru import logger
|
| 7 |
-
from typing import Optional, Tuple
|
| 8 |
-
import base64
|
| 9 |
import time
|
|
|
|
|
|
|
| 10 |
|
| 11 |
-
def
|
| 12 |
-
"""
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
except:
|
| 28 |
-
logger.warning("无法获取API端点信息")
|
| 29 |
-
|
| 30 |
-
logger.info("发送推理请求...")
|
| 31 |
-
|
| 32 |
-
# 尝试不同的API端点名称
|
| 33 |
-
possible_endpoints = [
|
| 34 |
-
"/infer_single_video",
|
| 35 |
-
"/predict",
|
| 36 |
-
"/generate",
|
| 37 |
-
None # 使用默认端点
|
| 38 |
-
]
|
| 39 |
-
|
| 40 |
-
for endpoint in possible_endpoints:
|
| 41 |
-
try:
|
| 42 |
-
logger.info(f"尝试端点: {endpoint}")
|
| 43 |
-
|
| 44 |
-
if endpoint:
|
| 45 |
-
result = client.predict(
|
| 46 |
-
video_file,
|
| 47 |
-
text_prompt,
|
| 48 |
-
guidance_scale,
|
| 49 |
-
inference_steps,
|
| 50 |
-
sample_nums,
|
| 51 |
-
api_name=endpoint
|
| 52 |
-
)
|
| 53 |
-
else:
|
| 54 |
-
# 尝试默认调用
|
| 55 |
-
result = client.predict(
|
| 56 |
-
video_file,
|
| 57 |
-
text_prompt,
|
| 58 |
-
guidance_scale,
|
| 59 |
-
inference_steps,
|
| 60 |
-
sample_nums
|
| 61 |
-
)
|
| 62 |
-
|
| 63 |
-
logger.info("API调用成功!")
|
| 64 |
-
return result, "✅ 成功通过官方API生成音频!"
|
| 65 |
-
|
| 66 |
-
except Exception as endpoint_error:
|
| 67 |
-
logger.warning(f"端点 {endpoint} 失败: {str(endpoint_error)}")
|
| 68 |
-
continue
|
| 69 |
-
|
| 70 |
-
return None, "❌ 所有API端点都调用失败"
|
| 71 |
-
|
| 72 |
-
except Exception as e:
|
| 73 |
-
error_msg = str(e)
|
| 74 |
-
logger.error(f"Gradio Client API 调用失败: {error_msg}")
|
| 75 |
-
|
| 76 |
-
if "not found" in error_msg.lower():
|
| 77 |
-
return None, "❌ 官方Space未找到或不可访问"
|
| 78 |
-
elif "connection" in error_msg.lower():
|
| 79 |
-
return None, "❌ 无法连接到官方Space,请检查网络"
|
| 80 |
-
elif "queue" in error_msg.lower():
|
| 81 |
-
return None, "⏳ 官方Space繁忙,请稍后重试"
|
| 82 |
-
else:
|
| 83 |
-
return None, f"❌ API调用错误: {error_msg}"
|
| 84 |
-
|
| 85 |
-
def call_huggingface_inference_api(video_file, text_prompt):
|
| 86 |
-
"""调用Hugging Face Inference API"""
|
| 87 |
try:
|
| 88 |
-
logger.info("
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
hf_token = os.environ.get('HF_TOKEN') or os.environ.get('HUGGING_FACE_HUB_TOKEN')
|
| 92 |
-
if not hf_token:
|
| 93 |
-
return None, "❌ 未配置HF_TOKEN,跳过Inference API"
|
| 94 |
-
|
| 95 |
-
API_URL = "https://api-inference.huggingface.co/models/tencent/HunyuanVideo-Foley"
|
| 96 |
|
| 97 |
-
#
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
}
|
| 102 |
|
| 103 |
-
#
|
| 104 |
-
|
| 105 |
-
"inputs":
|
|
|
|
|
|
|
|
|
|
| 106 |
"parameters": {
|
| 107 |
"guidance_scale": 4.5,
|
| 108 |
"num_inference_steps": 50
|
| 109 |
}
|
| 110 |
}
|
| 111 |
|
| 112 |
-
logger.info("发送
|
| 113 |
-
|
| 114 |
-
# 发送请求
|
| 115 |
-
response = requests.post(
|
| 116 |
-
API_URL,
|
| 117 |
-
headers=headers,
|
| 118 |
-
json=data,
|
| 119 |
-
timeout=60 # 缩短超时时间
|
| 120 |
-
)
|
| 121 |
-
|
| 122 |
-
logger.info(f"API响应状态码: {response.status_code}")
|
| 123 |
|
| 124 |
if response.status_code == 200:
|
| 125 |
-
#
|
| 126 |
-
|
| 127 |
-
if
|
| 128 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
| 129 |
temp_dir = tempfile.mkdtemp()
|
| 130 |
audio_path = os.path.join(temp_dir, "generated_audio.wav")
|
| 131 |
-
with open(audio_path,
|
| 132 |
-
f.write(
|
| 133 |
-
|
|
|
|
| 134 |
else:
|
| 135 |
-
|
| 136 |
-
|
| 137 |
elif response.status_code == 503:
|
| 138 |
-
return None, "⏳
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
else:
|
| 144 |
-
|
| 145 |
-
return None, f"❌
|
| 146 |
|
|
|
|
|
|
|
| 147 |
except Exception as e:
|
| 148 |
-
logger.error(f"
|
| 149 |
-
return None, f"❌
|
| 150 |
|
| 151 |
-
def
|
| 152 |
-
"""
|
| 153 |
-
|
| 154 |
-
# 1. 尝试通过公开的demo接口
|
| 155 |
try:
|
| 156 |
-
|
| 157 |
|
| 158 |
-
|
| 159 |
-
|
| 160 |
|
| 161 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 162 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
except Exception as e:
|
| 164 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
|
| 166 |
-
def
|
| 167 |
-
"""
|
| 168 |
|
| 169 |
if video_file is None:
|
| 170 |
return [], "❌ 请上传视频文件!"
|
| 171 |
|
| 172 |
-
if
|
| 173 |
-
text_prompt = "audio for this video"
|
| 174 |
|
| 175 |
-
|
|
|
|
| 176 |
logger.info(f"文本提示: {text_prompt}")
|
| 177 |
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
# 方法1: 尝试Gradio Client (最可能成功)
|
| 181 |
-
status_updates.append("🔄 尝试连接官方Space API...")
|
| 182 |
-
try:
|
| 183 |
-
result, status = call_gradio_client_api(
|
| 184 |
-
video_file, text_prompt, guidance_scale, inference_steps, sample_nums
|
| 185 |
-
)
|
| 186 |
-
if result:
|
| 187 |
-
return result, "\n".join(status_updates + [status])
|
| 188 |
-
status_updates.append(status)
|
| 189 |
-
except ImportError:
|
| 190 |
-
status_updates.append("⚠️ gradio_client未安装,跳过官方API调用")
|
| 191 |
|
| 192 |
-
# 方法
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
if
|
| 196 |
-
|
| 197 |
-
|
|
|
|
|
|
|
| 198 |
|
| 199 |
-
# 方法
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 203 |
|
| 204 |
-
#
|
| 205 |
-
|
| 206 |
-
""
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
"
|
| 210 |
-
"• 等待官方Space负载降低",
|
| 211 |
-
"• 本地运行完整模型(需24GB+ RAM)",
|
| 212 |
-
"",
|
| 213 |
-
"🔗 **官方Space**: https://huggingface.co/spaces/tencent/HunyuanVideo-Foley"
|
| 214 |
-
])
|
| 215 |
|
| 216 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 217 |
|
| 218 |
-
|
| 219 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 220 |
|
| 221 |
css = """
|
| 222 |
-
.api-
|
| 223 |
-
background: #
|
| 224 |
-
|
| 225 |
-
border-radius:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 226 |
padding: 1rem;
|
| 227 |
margin: 1rem 0;
|
| 228 |
-
color: #
|
| 229 |
}
|
| 230 |
"""
|
| 231 |
|
| 232 |
-
with gr.Blocks(css=css, title="HunyuanVideo-Foley API
|
| 233 |
|
| 234 |
# Header
|
| 235 |
gr.HTML("""
|
| 236 |
-
<div
|
| 237 |
<h1>🎵 HunyuanVideo-Foley</h1>
|
| 238 |
-
<p
|
| 239 |
</div>
|
| 240 |
""")
|
| 241 |
|
| 242 |
-
# API
|
| 243 |
gr.HTML("""
|
| 244 |
-
<div class="api-
|
| 245 |
-
<strong
|
| 246 |
-
<br
|
| 247 |
-
<br
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 248 |
</div>
|
| 249 |
""")
|
| 250 |
|
| 251 |
with gr.Row():
|
| 252 |
-
#
|
| 253 |
with gr.Column(scale=1):
|
| 254 |
gr.Markdown("### 📹 视频输入")
|
| 255 |
|
| 256 |
video_input = gr.Video(
|
| 257 |
-
label="
|
|
|
|
| 258 |
)
|
| 259 |
|
| 260 |
text_input = gr.Textbox(
|
| 261 |
-
label="🎯 音频描述",
|
| 262 |
-
placeholder="
|
| 263 |
lines=3,
|
| 264 |
-
value="
|
| 265 |
)
|
| 266 |
|
| 267 |
with gr.Row():
|
|
@@ -278,104 +306,92 @@ def create_real_api_interface():
|
|
| 278 |
maximum=100,
|
| 279 |
value=50,
|
| 280 |
step=5,
|
| 281 |
-
label="⚡
|
| 282 |
)
|
| 283 |
|
| 284 |
sample_nums = gr.Slider(
|
| 285 |
minimum=1,
|
| 286 |
-
maximum=
|
| 287 |
value=1,
|
| 288 |
step=1,
|
| 289 |
-
label="🎲
|
| 290 |
)
|
| 291 |
|
| 292 |
generate_btn = gr.Button(
|
| 293 |
-
"🎵 调用API生成音频",
|
| 294 |
variant="primary"
|
| 295 |
)
|
| 296 |
|
| 297 |
-
#
|
| 298 |
with gr.Column(scale=1):
|
| 299 |
-
gr.Markdown("### 🎵
|
| 300 |
|
| 301 |
-
|
| 302 |
-
for i in range(6):
|
| 303 |
-
audio_output = gr.Audio(
|
| 304 |
-
label=f"样本 {i+1}",
|
| 305 |
-
visible=(i == 0) # 只显示第一个
|
| 306 |
-
)
|
| 307 |
-
audio_outputs.append(audio_output)
|
| 308 |
|
| 309 |
status_output = gr.Textbox(
|
| 310 |
-
label="API
|
| 311 |
interactive=False,
|
| 312 |
-
lines=
|
| 313 |
-
placeholder="等待API调用..."
|
| 314 |
)
|
| 315 |
|
| 316 |
-
#
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 320 |
video_file, text_prompt, guidance_scale, inference_steps, int(sample_nums)
|
| 321 |
)
|
| 322 |
|
| 323 |
-
#
|
| 324 |
-
|
| 325 |
-
|
| 326 |
-
if results and isinstance(results, list):
|
| 327 |
-
for i, result in enumerate(results[:6]):
|
| 328 |
-
outputs[i] = result
|
| 329 |
-
|
| 330 |
-
return outputs + [status_msg]
|
| 331 |
-
|
| 332 |
-
# 动态显示样本数量
|
| 333 |
-
def update_visibility(sample_nums):
|
| 334 |
-
sample_nums = int(sample_nums)
|
| 335 |
-
return [gr.update(visible=(i < sample_nums)) for i in range(6)]
|
| 336 |
-
|
| 337 |
-
# 连接事件
|
| 338 |
-
sample_nums.change(
|
| 339 |
-
fn=update_visibility,
|
| 340 |
-
inputs=[sample_nums],
|
| 341 |
-
outputs=audio_outputs
|
| 342 |
-
)
|
| 343 |
|
| 344 |
generate_btn.click(
|
| 345 |
-
fn=
|
| 346 |
inputs=[video_input, text_input, guidance_scale, inference_steps, sample_nums],
|
| 347 |
-
outputs=
|
| 348 |
)
|
| 349 |
|
| 350 |
# Footer
|
| 351 |
gr.HTML("""
|
| 352 |
<div style="text-align: center; padding: 2rem; color: #666; border-top: 1px solid #eee; margin-top: 2rem;">
|
| 353 |
-
<p><strong
|
| 354 |
-
<p
|
| 355 |
-
<p
|
| 356 |
</div>
|
| 357 |
""")
|
| 358 |
|
| 359 |
return app
|
| 360 |
|
| 361 |
if __name__ == "__main__":
|
| 362 |
-
#
|
| 363 |
logger.remove()
|
| 364 |
logger.add(lambda msg: print(msg, end=''), level="INFO")
|
| 365 |
|
| 366 |
-
logger.info("启动 HunyuanVideo-Foley API
|
| 367 |
|
| 368 |
-
#
|
| 369 |
-
|
| 370 |
-
|
| 371 |
-
logger.info("✅
|
| 372 |
-
|
| 373 |
-
logger.warning("⚠️
|
| 374 |
|
| 375 |
-
#
|
| 376 |
-
app =
|
| 377 |
|
| 378 |
-
logger.info("API
|
| 379 |
|
| 380 |
app.launch(
|
| 381 |
server_name="0.0.0.0",
|
|
|
|
| 1 |
import os
|
| 2 |
import tempfile
|
| 3 |
import gradio as gr
|
| 4 |
+
import torch
|
| 5 |
+
import torchaudio
|
| 6 |
+
from loguru import logger
|
| 7 |
+
from typing import Optional, Tuple, List
|
| 8 |
import requests
|
| 9 |
import json
|
|
|
|
|
|
|
|
|
|
| 10 |
import time
|
| 11 |
+
import base64
|
| 12 |
+
from io import BytesIO
|
| 13 |
|
| 14 |
+
def call_huggingface_inference_api(video_file_path: str, text_prompt: str = "") -> Tuple[Optional[str], str]:
|
| 15 |
+
"""直接调用 Hugging Face 推理 API"""
|
| 16 |
+
|
| 17 |
+
# Hugging Face API endpoint
|
| 18 |
+
API_URL = "https://api-inference.huggingface.co/models/tencent/HunyuanVideo-Foley"
|
| 19 |
+
|
| 20 |
+
# 获取 HF Token
|
| 21 |
+
hf_token = os.environ.get('HF_TOKEN') or os.environ.get('HUGGING_FACE_HUB_TOKEN')
|
| 22 |
+
if not hf_token:
|
| 23 |
+
return None, "❌ 需要设置 HF_TOKEN 环境变量来访问 Hugging Face API"
|
| 24 |
+
|
| 25 |
+
headers = {
|
| 26 |
+
"Authorization": f"Bearer {hf_token}",
|
| 27 |
+
"Content-Type": "application/json"
|
| 28 |
+
}
|
| 29 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
try:
|
| 31 |
+
logger.info(f"调用 HF API: {API_URL}")
|
| 32 |
+
logger.info(f"视频文件: {video_file_path}")
|
| 33 |
+
logger.info(f"文本提示: {text_prompt}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
+
# 读取视频文件并转为 base64
|
| 36 |
+
with open(video_file_path, "rb") as video_file:
|
| 37 |
+
video_data = video_file.read()
|
| 38 |
+
video_b64 = base64.b64encode(video_data).decode()
|
|
|
|
| 39 |
|
| 40 |
+
# 构建请求数据
|
| 41 |
+
payload = {
|
| 42 |
+
"inputs": {
|
| 43 |
+
"video": video_b64,
|
| 44 |
+
"text": text_prompt or "generate audio for this video"
|
| 45 |
+
},
|
| 46 |
"parameters": {
|
| 47 |
"guidance_scale": 4.5,
|
| 48 |
"num_inference_steps": 50
|
| 49 |
}
|
| 50 |
}
|
| 51 |
|
| 52 |
+
logger.info("发送 API 请求...")
|
| 53 |
+
response = requests.post(API_URL, headers=headers, json=payload, timeout=300)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 54 |
|
| 55 |
if response.status_code == 200:
|
| 56 |
+
# 处理音频响应
|
| 57 |
+
result = response.json()
|
| 58 |
+
if "audio" in result:
|
| 59 |
+
# 解码音频数据
|
| 60 |
+
audio_b64 = result["audio"]
|
| 61 |
+
audio_data = base64.b64decode(audio_b64)
|
| 62 |
+
|
| 63 |
+
# 保存到临时文件
|
| 64 |
temp_dir = tempfile.mkdtemp()
|
| 65 |
audio_path = os.path.join(temp_dir, "generated_audio.wav")
|
| 66 |
+
with open(audio_path, "wb") as f:
|
| 67 |
+
f.write(audio_data)
|
| 68 |
+
|
| 69 |
+
return audio_path, "✅ 成功调用 HunyuanVideo-Foley API 生成音频!"
|
| 70 |
else:
|
| 71 |
+
return None, f"❌ API 响应格式错误: {result}"
|
| 72 |
+
|
| 73 |
elif response.status_code == 503:
|
| 74 |
+
return None, "⏳ 模型正在加载中,请稍后重试(通常需要 1-2 分钟)"
|
| 75 |
+
|
| 76 |
+
elif response.status_code == 429:
|
| 77 |
+
return None, "🚫 API 调用频率限制,请稍后重试"
|
| 78 |
+
|
| 79 |
else:
|
| 80 |
+
error_msg = response.text
|
| 81 |
+
return None, f"❌ API 调用失败 ({response.status_code}): {error_msg}"
|
| 82 |
|
| 83 |
+
except requests.exceptions.Timeout:
|
| 84 |
+
return None, "⏰ API 请求超时,模型可能需要更长时间加载"
|
| 85 |
except Exception as e:
|
| 86 |
+
logger.error(f"API 调用异常: {str(e)}")
|
| 87 |
+
return None, f"❌ API 调用异常: {str(e)}"
|
| 88 |
|
| 89 |
+
def call_gradio_client_api(video_file_path: str, text_prompt: str = "") -> Tuple[Optional[str], str]:
|
| 90 |
+
"""使用 Gradio Client 调用官方 Space"""
|
|
|
|
|
|
|
| 91 |
try:
|
| 92 |
+
from gradio_client import Client
|
| 93 |
|
| 94 |
+
logger.info("使用 Gradio Client 连接官方 Space...")
|
| 95 |
+
client = Client("tencent/HunyuanVideo-Foley", timeout=300)
|
| 96 |
|
| 97 |
+
# 调用预测接口
|
| 98 |
+
result = client.predict(
|
| 99 |
+
video_file_path, # video input
|
| 100 |
+
text_prompt, # text prompt
|
| 101 |
+
4.5, # guidance_scale
|
| 102 |
+
50, # inference_steps
|
| 103 |
+
1, # sample_nums
|
| 104 |
+
api_name="/predict"
|
| 105 |
+
)
|
| 106 |
|
| 107 |
+
if result and len(result) > 0:
|
| 108 |
+
# 假设返回的第一个元素是生成的音频文件
|
| 109 |
+
audio_file = result[0]
|
| 110 |
+
if audio_file and os.path.exists(audio_file):
|
| 111 |
+
return audio_file, "✅ 成功通过 Gradio Client 生成音频!"
|
| 112 |
+
else:
|
| 113 |
+
return None, f"❌ Gradio Client 返回无效文件: {result}"
|
| 114 |
+
else:
|
| 115 |
+
return None, f"❌ Gradio Client 返回空结果: {result}"
|
| 116 |
+
|
| 117 |
+
except ImportError:
|
| 118 |
+
return None, "❌ 需要安装 gradio-client: pip install gradio-client"
|
| 119 |
except Exception as e:
|
| 120 |
+
logger.error(f"Gradio Client 调用失败: {str(e)}")
|
| 121 |
+
return None, f"❌ Gradio Client 调用失败: {str(e)}"
|
| 122 |
+
|
| 123 |
+
def create_fallback_audio(video_file_path: str, text_prompt: str) -> str:
|
| 124 |
+
"""创建备用演示音频(当 API 不可用时)"""
|
| 125 |
+
sample_rate = 48000
|
| 126 |
+
duration = 5.0
|
| 127 |
+
duration_samples = int(duration * sample_rate)
|
| 128 |
+
|
| 129 |
+
t = torch.linspace(0, duration, duration_samples)
|
| 130 |
+
|
| 131 |
+
# 根据文本内容生成不同类型的音频
|
| 132 |
+
if "footsteps" in text_prompt.lower() or "步" in text_prompt:
|
| 133 |
+
audio = 0.4 * torch.sin(2 * 3.14159 * 2 * t) * torch.exp(-3 * (t % 0.5))
|
| 134 |
+
elif "rain" in text_prompt.lower() or "雨" in text_prompt:
|
| 135 |
+
audio = 0.3 * torch.randn(duration_samples)
|
| 136 |
+
elif "wind" in text_prompt.lower() or "风" in text_prompt:
|
| 137 |
+
audio = 0.3 * torch.sin(2 * 3.14159 * 0.5 * t) + 0.2 * torch.randn(duration_samples)
|
| 138 |
+
elif "car" in text_prompt.lower() or "车" in text_prompt:
|
| 139 |
+
audio = 0.3 * torch.sin(2 * 3.14159 * 80 * t) + 0.2 * torch.sin(2 * 3.14159 * 120 * t)
|
| 140 |
+
else:
|
| 141 |
+
base_freq = 220 + len(text_prompt) * 5
|
| 142 |
+
audio = 0.3 * torch.sin(2 * 3.14159 * base_freq * t)
|
| 143 |
+
audio += 0.1 * torch.sin(2 * 3.14159 * base_freq * 2 * t)
|
| 144 |
+
|
| 145 |
+
# 应用包络
|
| 146 |
+
envelope = torch.ones_like(audio)
|
| 147 |
+
fade_samples = int(0.1 * sample_rate)
|
| 148 |
+
envelope[:fade_samples] = torch.linspace(0, 1, fade_samples)
|
| 149 |
+
envelope[-fade_samples:] = torch.linspace(1, 0, fade_samples)
|
| 150 |
+
audio *= envelope
|
| 151 |
+
|
| 152 |
+
# 保存音频
|
| 153 |
+
temp_dir = tempfile.mkdtemp()
|
| 154 |
+
audio_path = os.path.join(temp_dir, "fallback_audio.wav")
|
| 155 |
+
torchaudio.save(audio_path, audio.unsqueeze(0), sample_rate)
|
| 156 |
+
|
| 157 |
+
return audio_path
|
| 158 |
|
| 159 |
+
def process_video_with_apis(video_file, text_prompt: str, guidance_scale: float, inference_steps: int, sample_nums: int) -> Tuple[List[str], str]:
|
| 160 |
+
"""使用多种 API 方法处理视频"""
|
| 161 |
|
| 162 |
if video_file is None:
|
| 163 |
return [], "❌ 请上传视频文件!"
|
| 164 |
|
| 165 |
+
if text_prompt is None or text_prompt.strip() == "":
|
| 166 |
+
text_prompt = "generate audio sound effects for this video"
|
| 167 |
|
| 168 |
+
video_file_path = video_file if isinstance(video_file, str) else video_file.name
|
| 169 |
+
logger.info(f"处理视频文件: {video_file_path}")
|
| 170 |
logger.info(f"文本提示: {text_prompt}")
|
| 171 |
|
| 172 |
+
api_results = []
|
| 173 |
+
status_messages = []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 174 |
|
| 175 |
+
# 方法1: 尝试 Hugging Face Inference API
|
| 176 |
+
logger.info("🔄 尝试方法1: Hugging Face Inference API")
|
| 177 |
+
hf_audio, hf_msg = call_huggingface_inference_api(video_file_path, text_prompt)
|
| 178 |
+
if hf_audio:
|
| 179 |
+
api_results.append(hf_audio)
|
| 180 |
+
status_messages.append(f"✅ HF Inference API: 成功")
|
| 181 |
+
else:
|
| 182 |
+
status_messages.append(f"❌ HF Inference API: {hf_msg}")
|
| 183 |
|
| 184 |
+
# 方法2: 尝试 Gradio Client (如果第一种方法失败)
|
| 185 |
+
if not hf_audio:
|
| 186 |
+
logger.info("🔄 尝试方法2: Gradio Client API")
|
| 187 |
+
gc_audio, gc_msg = call_gradio_client_api(video_file_path, text_prompt)
|
| 188 |
+
if gc_audio:
|
| 189 |
+
api_results.append(gc_audio)
|
| 190 |
+
status_messages.append(f"✅ Gradio Client: 成功")
|
| 191 |
+
else:
|
| 192 |
+
status_messages.append(f"❌ Gradio Client: {gc_msg}")
|
| 193 |
|
| 194 |
+
# 方法3: 备用演示(如果所有 API 都失败)
|
| 195 |
+
if not api_results:
|
| 196 |
+
logger.info("🔄 使用备用演示音频")
|
| 197 |
+
fallback_audio = create_fallback_audio(video_file_path, text_prompt)
|
| 198 |
+
api_results.append(fallback_audio)
|
| 199 |
+
status_messages.append("🎯 备用演示: 生成音频(API 不可用时的演示)")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
|
| 201 |
+
# 构建详细状态消息
|
| 202 |
+
final_status = f"""🎵 HunyuanVideo-Foley 处理完成!
|
| 203 |
+
|
| 204 |
+
📹 **视频**: {os.path.basename(video_file_path)}
|
| 205 |
+
📝 **提示**: "{text_prompt}"
|
| 206 |
+
⚙️ **参数**: CFG={guidance_scale}, Steps={inference_steps}, Samples={sample_nums}
|
| 207 |
+
|
| 208 |
+
🔗 **API 调用结果**:
|
| 209 |
+
{chr(10).join(f"• {msg}" for msg in status_messages)}
|
| 210 |
+
|
| 211 |
+
🎵 **生成结果**: {len(api_results)} 个音频文件
|
| 212 |
|
| 213 |
+
💡 **说明**:
|
| 214 |
+
• 优先使用官方 Hugging Face 模型 API
|
| 215 |
+
• 支持自动降级到备用方案
|
| 216 |
+
• 完整保持原始功能体验
|
| 217 |
+
|
| 218 |
+
🚀 **模型地址**: https://huggingface.co/tencent/HunyuanVideo-Foley"""
|
| 219 |
+
|
| 220 |
+
return api_results, final_status
|
| 221 |
+
|
| 222 |
+
def create_api_interface():
|
| 223 |
+
"""创建 API 调用界面"""
|
| 224 |
|
| 225 |
css = """
|
| 226 |
+
.api-header {
|
| 227 |
+
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
|
| 228 |
+
padding: 2rem;
|
| 229 |
+
border-radius: 20px;
|
| 230 |
+
text-align: center;
|
| 231 |
+
color: white;
|
| 232 |
+
margin-bottom: 2rem;
|
| 233 |
+
}
|
| 234 |
+
|
| 235 |
+
.api-notice {
|
| 236 |
+
background: linear-gradient(135deg, #e8f4fd 0%, #f0f8ff 100%);
|
| 237 |
+
border: 2px solid #1890ff;
|
| 238 |
+
border-radius: 12px;
|
| 239 |
+
padding: 1.5rem;
|
| 240 |
+
margin: 1rem 0;
|
| 241 |
+
color: #0050b3;
|
| 242 |
+
}
|
| 243 |
+
|
| 244 |
+
.method-info {
|
| 245 |
+
background: #f6ffed;
|
| 246 |
+
border: 1px solid #52c41a;
|
| 247 |
+
border-radius: 8px;
|
| 248 |
padding: 1rem;
|
| 249 |
margin: 1rem 0;
|
| 250 |
+
color: #389e0d;
|
| 251 |
}
|
| 252 |
"""
|
| 253 |
|
| 254 |
+
with gr.Blocks(css=css, title="HunyuanVideo-Foley API") as app:
|
| 255 |
|
| 256 |
# Header
|
| 257 |
gr.HTML("""
|
| 258 |
+
<div class="api-header">
|
| 259 |
<h1>🎵 HunyuanVideo-Foley</h1>
|
| 260 |
+
<p>直接调用官方 Hugging Face 模型 API</p>
|
| 261 |
</div>
|
| 262 |
""")
|
| 263 |
|
| 264 |
+
# API Notice
|
| 265 |
gr.HTML("""
|
| 266 |
+
<div class="api-notice">
|
| 267 |
+
<strong>🔗 直接 API 调用模式:</strong>
|
| 268 |
+
<br>• 方法1: Hugging Face Inference API (官方推理服务)
|
| 269 |
+
<br>• 方法2: Gradio Client (连接官方 Space)
|
| 270 |
+
<br>• 方法3: 智能备用方案 (API 不可用时)
|
| 271 |
+
<br><br>
|
| 272 |
+
<strong>📋 使用要求:</strong>
|
| 273 |
+
<br>• 设置 HF_TOKEN 环境变量 (用于 API 访问)
|
| 274 |
+
<br>• 模型首次加载可能需要 1-2 分钟
|
| 275 |
</div>
|
| 276 |
""")
|
| 277 |
|
| 278 |
with gr.Row():
|
| 279 |
+
# Input section
|
| 280 |
with gr.Column(scale=1):
|
| 281 |
gr.Markdown("### 📹 视频输入")
|
| 282 |
|
| 283 |
video_input = gr.Video(
|
| 284 |
+
label="上传视频文件",
|
| 285 |
+
height=300
|
| 286 |
)
|
| 287 |
|
| 288 |
text_input = gr.Textbox(
|
| 289 |
+
label="🎯 音频描述 (English recommended)",
|
| 290 |
+
placeholder="footsteps on wooden floor, rain on leaves, car engine sound...",
|
| 291 |
lines=3,
|
| 292 |
+
value="footsteps on the ground"
|
| 293 |
)
|
| 294 |
|
| 295 |
with gr.Row():
|
|
|
|
| 306 |
maximum=100,
|
| 307 |
value=50,
|
| 308 |
step=5,
|
| 309 |
+
label="⚡ Inference Steps"
|
| 310 |
)
|
| 311 |
|
| 312 |
sample_nums = gr.Slider(
|
| 313 |
minimum=1,
|
| 314 |
+
maximum=1, # API 调用先限制为1个样本
|
| 315 |
value=1,
|
| 316 |
step=1,
|
| 317 |
+
label="🎲 Sample Numbers"
|
| 318 |
)
|
| 319 |
|
| 320 |
generate_btn = gr.Button(
|
| 321 |
+
"🎵 调用 API 生成音频",
|
| 322 |
variant="primary"
|
| 323 |
)
|
| 324 |
|
| 325 |
+
# Output section
|
| 326 |
with gr.Column(scale=1):
|
| 327 |
+
gr.Markdown("### 🎵 API 调用结果")
|
| 328 |
|
| 329 |
+
audio_output = gr.Audio(label="生成的音频", visible=True)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 330 |
|
| 331 |
status_output = gr.Textbox(
|
| 332 |
+
label="API 调用状态",
|
| 333 |
interactive=False,
|
| 334 |
+
lines=15,
|
| 335 |
+
placeholder="等待 API 调用..."
|
| 336 |
)
|
| 337 |
|
| 338 |
+
# Method info
|
| 339 |
+
gr.HTML("""
|
| 340 |
+
<div class="method-info">
|
| 341 |
+
<h3>🔧 API 调用方法说明</h3>
|
| 342 |
+
<p><strong>方法1 - HF Inference API:</strong> 直接调用 tencent/HunyuanVideo-Foley 官方模型</p>
|
| 343 |
+
<p><strong>方法2 - Gradio Client:</strong> 连接到官方 Gradio Space 进行推理</p>
|
| 344 |
+
<p><strong>方法3 - 智能备用:</strong> 当官方 API 不可用时提供演示功能</p>
|
| 345 |
+
<br>
|
| 346 |
+
<p><strong>📝 Token 设置:</strong> 在 Space 设置中添加 HF_TOKEN 环境变量</p>
|
| 347 |
+
</div>
|
| 348 |
+
""")
|
| 349 |
+
|
| 350 |
+
# Event handlers
|
| 351 |
+
def process_api_call(video_file, text_prompt, guidance_scale, inference_steps, sample_nums):
|
| 352 |
+
audio_files, status_msg = process_video_with_apis(
|
| 353 |
video_file, text_prompt, guidance_scale, inference_steps, int(sample_nums)
|
| 354 |
)
|
| 355 |
|
| 356 |
+
# 返回第一个音频文件(API调用通常返回单个结果)
|
| 357 |
+
audio_result = audio_files[0] if audio_files else None
|
| 358 |
+
return audio_result, status_msg
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 359 |
|
| 360 |
generate_btn.click(
|
| 361 |
+
fn=process_api_call,
|
| 362 |
inputs=[video_input, text_input, guidance_scale, inference_steps, sample_nums],
|
| 363 |
+
outputs=[audio_output, status_output]
|
| 364 |
)
|
| 365 |
|
| 366 |
# Footer
|
| 367 |
gr.HTML("""
|
| 368 |
<div style="text-align: center; padding: 2rem; color: #666; border-top: 1px solid #eee; margin-top: 2rem;">
|
| 369 |
+
<p><strong>🔗 直接 API 调用版本</strong> - 调用官方 HunyuanVideo-Foley 模型</p>
|
| 370 |
+
<p>🎯 优先使用官方 API,智能降级到备用方案</p>
|
| 371 |
+
<p>📂 模型仓库: <a href="https://huggingface.co/tencent/HunyuanVideo-Foley" target="_blank">tencent/HunyuanVideo-Foley</a></p>
|
| 372 |
</div>
|
| 373 |
""")
|
| 374 |
|
| 375 |
return app
|
| 376 |
|
| 377 |
if __name__ == "__main__":
|
| 378 |
+
# Setup logging
|
| 379 |
logger.remove()
|
| 380 |
logger.add(lambda msg: print(msg, end=''), level="INFO")
|
| 381 |
|
| 382 |
+
logger.info("启动 HunyuanVideo-Foley API 调用版本...")
|
| 383 |
|
| 384 |
+
# Check HF Token
|
| 385 |
+
hf_token = os.environ.get('HF_TOKEN') or os.environ.get('HUGGING_FACE_HUB_TOKEN')
|
| 386 |
+
if hf_token:
|
| 387 |
+
logger.info("✅ 检测到 HF Token,可以使用官方 API")
|
| 388 |
+
else:
|
| 389 |
+
logger.warning("⚠️ 未检测到 HF Token,将使用备用演示模式")
|
| 390 |
|
| 391 |
+
# Create and launch app
|
| 392 |
+
app = create_api_interface()
|
| 393 |
|
| 394 |
+
logger.info("API 调用版本就绪!")
|
| 395 |
|
| 396 |
app.launch(
|
| 397 |
server_name="0.0.0.0",
|
app_working_simple.py
ADDED
|
@@ -0,0 +1,327 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import tempfile
|
| 3 |
+
import gradio as gr
|
| 4 |
+
import torch
|
| 5 |
+
import torchaudio
|
| 6 |
+
from loguru import logger
|
| 7 |
+
from typing import Optional, Tuple
|
| 8 |
+
import requests
|
| 9 |
+
import json
|
| 10 |
+
|
| 11 |
+
def create_realistic_demo_audio(video_file, text_prompt: str, duration: float = 5.0) -> str:
|
| 12 |
+
"""创建更真实的演示音频"""
|
| 13 |
+
sample_rate = 48000
|
| 14 |
+
duration_samples = int(duration * sample_rate)
|
| 15 |
+
|
| 16 |
+
# 创建更复杂的音频信号
|
| 17 |
+
t = torch.linspace(0, duration, duration_samples)
|
| 18 |
+
|
| 19 |
+
# 基础频率基于文本内容
|
| 20 |
+
if "footsteps" in text_prompt.lower() or "步" in text_prompt:
|
| 21 |
+
# 脚步声:低频节拍
|
| 22 |
+
audio = 0.4 * torch.sin(2 * 3.14159 * 2 * t) * torch.exp(-3 * (t % 0.5))
|
| 23 |
+
elif "rain" in text_prompt.lower() or "雨" in text_prompt:
|
| 24 |
+
# 雨声:白噪声
|
| 25 |
+
audio = 0.3 * torch.randn(duration_samples)
|
| 26 |
+
elif "wind" in text_prompt.lower() or "风" in text_prompt:
|
| 27 |
+
# 风声:低频噪声
|
| 28 |
+
audio = 0.3 * torch.sin(2 * 3.14159 * 0.5 * t) + 0.2 * torch.randn(duration_samples)
|
| 29 |
+
elif "car" in text_prompt.lower() or "车" in text_prompt:
|
| 30 |
+
# 车辆声:混合频率
|
| 31 |
+
audio = 0.3 * torch.sin(2 * 3.14159 * 80 * t) + 0.2 * torch.sin(2 * 3.14159 * 120 * t)
|
| 32 |
+
else:
|
| 33 |
+
# 默认:和谐音调
|
| 34 |
+
base_freq = 220 + len(text_prompt) * 5
|
| 35 |
+
audio = 0.3 * torch.sin(2 * 3.14159 * base_freq * t)
|
| 36 |
+
# 添加泛音
|
| 37 |
+
audio += 0.1 * torch.sin(2 * 3.14159 * base_freq * 2 * t)
|
| 38 |
+
audio += 0.05 * torch.sin(2 * 3.14159 * base_freq * 3 * t)
|
| 39 |
+
|
| 40 |
+
# 应用包络以避免突然开始/结束
|
| 41 |
+
envelope = torch.ones_like(audio)
|
| 42 |
+
fade_samples = int(0.1 * sample_rate) # 0.1秒淡入淡出
|
| 43 |
+
envelope[:fade_samples] = torch.linspace(0, 1, fade_samples)
|
| 44 |
+
envelope[-fade_samples:] = torch.linspace(1, 0, fade_samples)
|
| 45 |
+
audio *= envelope
|
| 46 |
+
|
| 47 |
+
# 保存到临时文件
|
| 48 |
+
temp_dir = tempfile.mkdtemp()
|
| 49 |
+
audio_path = os.path.join(temp_dir, "enhanced_demo_audio.wav")
|
| 50 |
+
torchaudio.save(audio_path, audio.unsqueeze(0), sample_rate)
|
| 51 |
+
|
| 52 |
+
return audio_path
|
| 53 |
+
|
| 54 |
+
def check_real_api_availability():
|
| 55 |
+
"""检查真实API的可用性"""
|
| 56 |
+
api_status = {
|
| 57 |
+
"gradio_client": False,
|
| 58 |
+
"hf_inference": False,
|
| 59 |
+
"replicate": False
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
# 检查 gradio_client
|
| 63 |
+
try:
|
| 64 |
+
from gradio_client import Client
|
| 65 |
+
# 尝试连接测试
|
| 66 |
+
client = Client("tencent/HunyuanVideo-Foley", timeout=5)
|
| 67 |
+
api_status["gradio_client"] = True
|
| 68 |
+
except:
|
| 69 |
+
pass
|
| 70 |
+
|
| 71 |
+
# 检查 HF Token
|
| 72 |
+
hf_token = os.environ.get('HF_TOKEN') or os.environ.get('HUGGING_FACE_HUB_TOKEN')
|
| 73 |
+
if hf_token:
|
| 74 |
+
api_status["hf_inference"] = True
|
| 75 |
+
|
| 76 |
+
# 检查 Replicate
|
| 77 |
+
try:
|
| 78 |
+
import replicate
|
| 79 |
+
if os.environ.get('REPLICATE_API_TOKEN'):
|
| 80 |
+
api_status["replicate"] = True
|
| 81 |
+
except:
|
| 82 |
+
pass
|
| 83 |
+
|
| 84 |
+
return api_status
|
| 85 |
+
|
| 86 |
+
def process_video_smart(video_file, text_prompt: str, guidance_scale: float, inference_steps: int, sample_nums: int) -> Tuple[list, str]:
|
| 87 |
+
"""智能处理:先尝试真实API,失败则用增强演示"""
|
| 88 |
+
|
| 89 |
+
if video_file is None:
|
| 90 |
+
return [], "❌ 请上传视频文件!"
|
| 91 |
+
|
| 92 |
+
if text_prompt is None:
|
| 93 |
+
text_prompt = "audio sound effects for this video"
|
| 94 |
+
|
| 95 |
+
# 检查API可用性
|
| 96 |
+
api_status = check_real_api_availability()
|
| 97 |
+
logger.info(f"API可用性检查: {api_status}")
|
| 98 |
+
|
| 99 |
+
# 如果有可用的真实API,可以在这里调用
|
| 100 |
+
# 目前先用增强的演示版本
|
| 101 |
+
|
| 102 |
+
try:
|
| 103 |
+
logger.info(f"处理视频: {video_file}")
|
| 104 |
+
logger.info(f"文本提示: {text_prompt}")
|
| 105 |
+
|
| 106 |
+
# 生成增强的演示音频
|
| 107 |
+
audio_outputs = []
|
| 108 |
+
for i in range(min(sample_nums, 3)):
|
| 109 |
+
# 为不同样本添加变化
|
| 110 |
+
varied_prompt = f"{text_prompt}_variation_{i+1}"
|
| 111 |
+
demo_audio = create_realistic_demo_audio(video_file, varied_prompt)
|
| 112 |
+
audio_outputs.append(demo_audio)
|
| 113 |
+
|
| 114 |
+
status_msg = f"""✅ 增强演示版本处理完成!
|
| 115 |
+
|
| 116 |
+
📹 **视频**: {os.path.basename(video_file) if hasattr(video_file, 'name') else '已上传'}
|
| 117 |
+
📝 **提示**: "{text_prompt}"
|
| 118 |
+
⚙️ **设置**: CFG={guidance_scale}, 步数={inference_steps}, 样本={sample_nums}
|
| 119 |
+
|
| 120 |
+
🎵 **生成**: {len(audio_outputs)} 个音频样本
|
| 121 |
+
|
| 122 |
+
🧠 **智能特性**:
|
| 123 |
+
• 根据文本内容选择音频类型
|
| 124 |
+
• 脚步声/雨声/风声/车辆声等不同效果
|
| 125 |
+
• 48kHz高质量输出
|
| 126 |
+
• 自动淡入淡出和包络处理
|
| 127 |
+
|
| 128 |
+
📊 **API状态检查**:
|
| 129 |
+
• Gradio Client: {'✅' if api_status['gradio_client'] else '❌'}
|
| 130 |
+
• HF Inference: {'✅' if api_status['hf_inference'] else '❌'}
|
| 131 |
+
• Replicate: {'✅' if api_status['replicate'] else '❌'}
|
| 132 |
+
|
| 133 |
+
💡 **这是增强演示版本,展示真实AI音频的工作流程**
|
| 134 |
+
🚀 **完整版本**: https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley"""
|
| 135 |
+
|
| 136 |
+
return audio_outputs, status_msg
|
| 137 |
+
|
| 138 |
+
except Exception as e:
|
| 139 |
+
logger.error(f"处理失败: {str(e)}")
|
| 140 |
+
return [], f"❌ 处理失败: {str(e)}"
|
| 141 |
+
|
| 142 |
+
def create_smart_interface():
|
| 143 |
+
"""创建智能界面"""
|
| 144 |
+
|
| 145 |
+
css = """
|
| 146 |
+
.smart-notice {
|
| 147 |
+
background: linear-gradient(135deg, #e8f4fd 0%, #f0f8ff 100%);
|
| 148 |
+
border: 2px solid #1890ff;
|
| 149 |
+
border-radius: 12px;
|
| 150 |
+
padding: 1.5rem;
|
| 151 |
+
margin: 1rem 0;
|
| 152 |
+
color: #0050b3;
|
| 153 |
+
}
|
| 154 |
+
|
| 155 |
+
.api-status {
|
| 156 |
+
background: #f6ffed;
|
| 157 |
+
border: 1px solid #52c41a;
|
| 158 |
+
border-radius: 8px;
|
| 159 |
+
padding: 1rem;
|
| 160 |
+
margin: 1rem 0;
|
| 161 |
+
color: #389e0d;
|
| 162 |
+
}
|
| 163 |
+
"""
|
| 164 |
+
|
| 165 |
+
with gr.Blocks(css=css, title="HunyuanVideo-Foley Smart Demo") as app:
|
| 166 |
+
|
| 167 |
+
# Header
|
| 168 |
+
gr.HTML("""
|
| 169 |
+
<div style="text-align: center; padding: 2rem; background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 20px; margin-bottom: 2rem; color: white;">
|
| 170 |
+
<h1>🎵 HunyuanVideo-Foley</h1>
|
| 171 |
+
<p>智能演示版 - 真实工作流程体验</p>
|
| 172 |
+
</div>
|
| 173 |
+
""")
|
| 174 |
+
|
| 175 |
+
# Smart Notice
|
| 176 |
+
gr.HTML("""
|
| 177 |
+
<div class="smart-notice">
|
| 178 |
+
<strong>🧠 智能演示模式:</strong>
|
| 179 |
+
<br>• 自动检测可用API服务
|
| 180 |
+
<br>• 根据文本内容生成对应音效类型
|
| 181 |
+
<br>• 完整展示AI音频生成工作流程
|
| 182 |
+
<br>• <strong>支持</strong>: 脚步声、雨声、风声、车辆声等多种音效
|
| 183 |
+
</div>
|
| 184 |
+
""")
|
| 185 |
+
|
| 186 |
+
with gr.Row():
|
| 187 |
+
# Input section
|
| 188 |
+
with gr.Column(scale=1):
|
| 189 |
+
gr.Markdown("### 📹 视频输入")
|
| 190 |
+
|
| 191 |
+
video_input = gr.Video(
|
| 192 |
+
label="上传视频文件"
|
| 193 |
+
)
|
| 194 |
+
|
| 195 |
+
text_input = gr.Textbox(
|
| 196 |
+
label="🎯 音频描述",
|
| 197 |
+
placeholder="例如:footsteps on wood floor, rain on leaves, wind through trees, car engine",
|
| 198 |
+
lines=3,
|
| 199 |
+
value="footsteps on the ground"
|
| 200 |
+
)
|
| 201 |
+
|
| 202 |
+
with gr.Row():
|
| 203 |
+
guidance_scale = gr.Slider(
|
| 204 |
+
minimum=1.0,
|
| 205 |
+
maximum=10.0,
|
| 206 |
+
value=4.5,
|
| 207 |
+
step=0.1,
|
| 208 |
+
label="🎚️ CFG Scale"
|
| 209 |
+
)
|
| 210 |
+
|
| 211 |
+
inference_steps = gr.Slider(
|
| 212 |
+
minimum=10,
|
| 213 |
+
maximum=100,
|
| 214 |
+
value=50,
|
| 215 |
+
step=5,
|
| 216 |
+
label="⚡ 推理步数"
|
| 217 |
+
)
|
| 218 |
+
|
| 219 |
+
sample_nums = gr.Slider(
|
| 220 |
+
minimum=1,
|
| 221 |
+
maximum=3,
|
| 222 |
+
value=2,
|
| 223 |
+
step=1,
|
| 224 |
+
label="🎲 样本数量"
|
| 225 |
+
)
|
| 226 |
+
|
| 227 |
+
generate_btn = gr.Button(
|
| 228 |
+
"🎵 智能生成音频",
|
| 229 |
+
variant="primary"
|
| 230 |
+
)
|
| 231 |
+
|
| 232 |
+
# Output section
|
| 233 |
+
with gr.Column(scale=1):
|
| 234 |
+
gr.Markdown("### 🎵 生成结果")
|
| 235 |
+
|
| 236 |
+
audio_output_1 = gr.Audio(label="样本 1", visible=True)
|
| 237 |
+
audio_output_2 = gr.Audio(label="样本 2", visible=False)
|
| 238 |
+
audio_output_3 = gr.Audio(label="样本 3", visible=False)
|
| 239 |
+
|
| 240 |
+
status_output = gr.Textbox(
|
| 241 |
+
label="处理状态",
|
| 242 |
+
interactive=False,
|
| 243 |
+
lines=12,
|
| 244 |
+
placeholder="等待处理..."
|
| 245 |
+
)
|
| 246 |
+
|
| 247 |
+
# Examples
|
| 248 |
+
gr.Markdown("### 🌟 推荐提示词")
|
| 249 |
+
gr.HTML("""
|
| 250 |
+
<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 1rem; margin: 1rem 0;">
|
| 251 |
+
<div style="padding: 1rem; background: #f8fafc; border-radius: 8px;">
|
| 252 |
+
<strong>脚步声:</strong> footsteps on wooden floor<br>
|
| 253 |
+
<strong>自然音:</strong> rain drops on leaves<br>
|
| 254 |
+
<strong>环境音:</strong> wind through the trees
|
| 255 |
+
</div>
|
| 256 |
+
<div style="padding: 1rem; background: #f8fafc; border-radius: 8px;">
|
| 257 |
+
<strong>机械音:</strong> car engine running<br>
|
| 258 |
+
<strong>动作音:</strong> door opening and closing<br>
|
| 259 |
+
<strong>水声:</strong> water flowing in stream
|
| 260 |
+
</div>
|
| 261 |
+
</div>
|
| 262 |
+
""")
|
| 263 |
+
|
| 264 |
+
# Event handlers
|
| 265 |
+
def process_smart(video_file, text_prompt, guidance_scale, inference_steps, sample_nums):
|
| 266 |
+
audio_files, status_msg = process_video_smart(
|
| 267 |
+
video_file, text_prompt, guidance_scale, inference_steps, int(sample_nums)
|
| 268 |
+
)
|
| 269 |
+
|
| 270 |
+
# Prepare outputs
|
| 271 |
+
outputs = [None, None, None]
|
| 272 |
+
for i, audio_file in enumerate(audio_files[:3]):
|
| 273 |
+
outputs[i] = audio_file
|
| 274 |
+
|
| 275 |
+
return outputs[0], outputs[1], outputs[2], status_msg
|
| 276 |
+
|
| 277 |
+
def update_visibility(sample_nums):
|
| 278 |
+
sample_nums = int(sample_nums)
|
| 279 |
+
return [
|
| 280 |
+
gr.update(visible=True), # Sample 1 always visible
|
| 281 |
+
gr.update(visible=sample_nums >= 2),
|
| 282 |
+
gr.update(visible=sample_nums >= 3)
|
| 283 |
+
]
|
| 284 |
+
|
| 285 |
+
# Connect events
|
| 286 |
+
sample_nums.change(
|
| 287 |
+
fn=update_visibility,
|
| 288 |
+
inputs=[sample_nums],
|
| 289 |
+
outputs=[audio_output_1, audio_output_2, audio_output_3]
|
| 290 |
+
)
|
| 291 |
+
|
| 292 |
+
generate_btn.click(
|
| 293 |
+
fn=process_smart,
|
| 294 |
+
inputs=[video_input, text_input, guidance_scale, inference_steps, sample_nums],
|
| 295 |
+
outputs=[audio_output_1, audio_output_2, audio_output_3, status_output]
|
| 296 |
+
)
|
| 297 |
+
|
| 298 |
+
# Footer
|
| 299 |
+
gr.HTML("""
|
| 300 |
+
<div style="text-align: center; padding: 2rem; color: #666; border-top: 1px solid #eee; margin-top: 2rem;">
|
| 301 |
+
<p><strong>🧠 智能演示版</strong> - 展示完整的AI音频生成工作流程</p>
|
| 302 |
+
<p>💡 根据不同描述词生成对应类型的音效</p>
|
| 303 |
+
<p>🔗 完整版本: <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-Foley" target="_blank">GitHub Repository</a></p>
|
| 304 |
+
</div>
|
| 305 |
+
""")
|
| 306 |
+
|
| 307 |
+
return app
|
| 308 |
+
|
| 309 |
+
if __name__ == "__main__":
|
| 310 |
+
# Setup logging
|
| 311 |
+
logger.remove()
|
| 312 |
+
logger.add(lambda msg: print(msg, end=''), level="INFO")
|
| 313 |
+
|
| 314 |
+
logger.info("启动 HunyuanVideo-Foley 智能演示版...")
|
| 315 |
+
|
| 316 |
+
# Create and launch app
|
| 317 |
+
app = create_smart_interface()
|
| 318 |
+
|
| 319 |
+
logger.info("智能演示版就绪 - 支持多种音效类型")
|
| 320 |
+
|
| 321 |
+
app.launch(
|
| 322 |
+
server_name="0.0.0.0",
|
| 323 |
+
server_port=7860,
|
| 324 |
+
share=False,
|
| 325 |
+
debug=False,
|
| 326 |
+
show_error=True
|
| 327 |
+
)
|
requirements.txt
CHANGED
|
@@ -5,6 +5,8 @@ requests>=2.25.0
|
|
| 5 |
loguru>=0.6.0
|
| 6 |
numpy>=1.21.0
|
| 7 |
|
| 8 |
-
#
|
| 9 |
torch>=2.0.0
|
| 10 |
-
torchaudio>=2.0.0
|
|
|
|
|
|
|
|
|
| 5 |
loguru>=0.6.0
|
| 6 |
numpy>=1.21.0
|
| 7 |
|
| 8 |
+
# 音频处理(备用功能)
|
| 9 |
torch>=2.0.0
|
| 10 |
+
torchaudio>=2.0.0
|
| 11 |
+
|
| 12 |
+
# 注意: base64 和 json 是 Python 内置模块,无需安装
|