Spaces:
Sleeping
Sleeping
update README.md
Browse files
README.md
CHANGED
@@ -10,4 +10,208 @@ pinned: false
|
|
10 |
license: apache-2.0
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
license: apache-2.0
|
11 |
---
|
12 |
|
13 |
+
<img src="https://github.com/liu-zhy/graph-of-thought/assets/26198430/4e8b2511-ce69-4c1a-95a1-5aed4d432a82" width=10% align="left" />
|
14 |
+
|
15 |
+
# ControlLLM
|
16 |
+
|
17 |
+
ControlLLM: Augmenting Large Language Models with Tools by Searching on Graphs [[Paper](https://arxiv.org/abs/2310.17796)]
|
18 |
+
|
19 |
+
We present ControlLLM, a novel framework that enables large language models (LLMs) to utilize multi-modal tools for solving complex real-world tasks. Despite the remarkable performance of LLMs, they still struggle with tool invocation due to ambiguous user prompts, inaccurate tool selection and parameterization, and inefficient tool scheduling. To overcome these challenges, our framework comprises three key components: (1) a $\textit{task decomposer}$ that breaks down a complex task into clear subtasks with well-defined inputs and outputs; (2) a $\textit{Thoughts-on-Graph (ToG) paradigm}$ that searches the optimal solution path on a pre-built tool graph, which specifies the parameter and dependency relations among different tools; and (3) an $\textit{execution engine with a rich toolbox}$ that interprets the solution path and runs the tools efficiently on different computational devices. We evaluate our framework on diverse tasks involving image, audio, and video processing, demonstrating its superior accuracy, efficiency, and versatility compared to existing methods.
|
20 |
+
|
21 |
+
|
22 |
+
## ๐ค Video Demo
|
23 |
+
|
24 |
+
<!-- <table>
|
25 |
+
<tr>
|
26 |
+
<td><img width="450" src="https://github.com/liu-zhy/graph-of-thought/assets/26198430/7fe7d1ec-e37e-4ea8-8201-dc639c82ba66" alt="Image 1"></td>
|
27 |
+
<td><img width="450" src="https://github.com/liu-zhy/graph-of-thought/assets/26198430/a8bc6644-368b-42e3-844a-9962fdc9bd01" alt="Image 2"></td>
|
28 |
+
</tr>
|
29 |
+
</table>
|
30 |
+
-->
|
31 |
+
|
32 |
+
https://github.com/OpenGVLab/ControlLLM/assets/13723743/cf72861e-0e7b-4c15-89ee-7fa1d838d00f
|
33 |
+
|
34 |
+
## ๐ System Overview
|
35 |
+
|
36 |
+
![arch](https://github.com/liu-zhy/graph-of-thought/assets/95175307/ad3db5c1-f1c7-4e1f-be48-81ed5228f2b0#center)
|
37 |
+
|
38 |
+
## ๐ Major Features
|
39 |
+
- Image Perception
|
40 |
+
- Image Editing
|
41 |
+
- Image Generation
|
42 |
+
- Video Perception
|
43 |
+
- Video Editing
|
44 |
+
- Video Generation
|
45 |
+
- Audio Perception
|
46 |
+
- Audio Generation
|
47 |
+
- Multi-Solution
|
48 |
+
- Pointing Inputs
|
49 |
+
- Resource Type Awareness
|
50 |
+
|
51 |
+
## ๐๏ธ Schedule
|
52 |
+
|
53 |
+
- [ ] Launch online demo
|
54 |
+
|
55 |
+
## ๐ ๏ธInstallation
|
56 |
+
|
57 |
+
### Basic requirements
|
58 |
+
|
59 |
+
* Linux
|
60 |
+
* Python 3.10+
|
61 |
+
* PyTorch 2.0+
|
62 |
+
* CUDA 11.8+
|
63 |
+
|
64 |
+
### Clone project
|
65 |
+
|
66 |
+
Execute the following command in the root directory:
|
67 |
+
|
68 |
+
```bash
|
69 |
+
git clone https://github.com/OpenGVLab/ControlLLM.git
|
70 |
+
```
|
71 |
+
|
72 |
+
### Install dependencies
|
73 |
+
|
74 |
+
Setup environment:
|
75 |
+
|
76 |
+
```bash
|
77 |
+
|
78 |
+
conda create -n cllm python=3.10
|
79 |
+
|
80 |
+
conda activate cllm
|
81 |
+
|
82 |
+
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
|
83 |
+
```
|
84 |
+
|
85 |
+
Install [LLaVA](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file):
|
86 |
+
|
87 |
+
```bash
|
88 |
+
pip install git+https://github.com/haotian-liu/LLaVA.git
|
89 |
+
```
|
90 |
+
|
91 |
+
Then install other dependencies:
|
92 |
+
|
93 |
+
```bash
|
94 |
+
cd controlllm
|
95 |
+
|
96 |
+
pip install -r requirements.txt
|
97 |
+
```
|
98 |
+
|
99 |
+
## ๐จโ๐ซ Get Started
|
100 |
+
|
101 |
+
### Launch tool services
|
102 |
+
|
103 |
+
Please put your personal OpenAI Key and [Weather Key](https://www.visualcrossing.com/weather-api) into the corresponding environment variables.
|
104 |
+
```bash
|
105 |
+
|
106 |
+
cd ./controlllm
|
107 |
+
# openai key
|
108 |
+
export OPENAI_API_KEY="..."
|
109 |
+
# openai base
|
110 |
+
export OPENAI_BASE_URL="..."
|
111 |
+
# weather api key
|
112 |
+
export WEATHER_API_KEY="..."
|
113 |
+
|
114 |
+
python -m cllm.services.launch --port 10011 --host 0.0.0.0
|
115 |
+
```
|
116 |
+
|
117 |
+
### Launch ToG service
|
118 |
+
|
119 |
+
```bash
|
120 |
+
cd ./controlllm
|
121 |
+
|
122 |
+
export TOG_SERVICES_PORT=10011
|
123 |
+
export OPENAI_BASE_URL="..."
|
124 |
+
export OPENAI_API_KEY="..."
|
125 |
+
python -m cllm.services.tog.launch --port 10012 --host 0.0.0.0
|
126 |
+
|
127 |
+
```
|
128 |
+
|
129 |
+
### Launch gradio demo
|
130 |
+
|
131 |
+
Use `openssl` to generate the certificate:
|
132 |
+
```shell
|
133 |
+
mkdir certificate
|
134 |
+
|
135 |
+
openssl req -x509 -newkey rsa:4096 -keyout certificate/key.pem -out certificate/cert.pem -sha256 -days 365 -nodes
|
136 |
+
```
|
137 |
+
Launch gradio demo:
|
138 |
+
```bash
|
139 |
+
cd ./controlllm
|
140 |
+
|
141 |
+
export TOG_PORT=10012
|
142 |
+
export TOG_SERVICES_PORT=10011
|
143 |
+
export RESOURCE_ROOT="./client_resources"
|
144 |
+
export GRADIO_TEMP_DIR="$HOME/.tmp"
|
145 |
+
export OPENAI_BASE_URL="..."
|
146 |
+
export OPENAI_API_KEY="..."
|
147 |
+
|
148 |
+
python -m cllm.app.gradio --controller "cllm.agents.tog.Controller" --server_port 10024
|
149 |
+
|
150 |
+
```
|
151 |
+
|
152 |
+
### Tools as Services
|
153 |
+
|
154 |
+
Take image generation as an example, we first launch the service.
|
155 |
+
|
156 |
+
```bash
|
157 |
+
|
158 |
+
python -m cllm.services.image_generation.launch --port 10011 --host 0.0.0.0
|
159 |
+
|
160 |
+
```
|
161 |
+
|
162 |
+
Then, we call the services via python api.
|
163 |
+
|
164 |
+
```python
|
165 |
+
from cllm.services.image_generation.api import *
|
166 |
+
setup(port=10011)
|
167 |
+
text2image('A horse')
|
168 |
+
```
|
169 |
+
|
170 |
+
๐ฌ Launch all in one endpoint
|
171 |
+
|
172 |
+
```bash
|
173 |
+
python -m cllm.services.launch --port 10011 --host 0.0.0.0
|
174 |
+
```
|
175 |
+
|
176 |
+
## ๐ ๏ธ Support Tools
|
177 |
+
|
178 |
+
See [Tools](TOOL.md)
|
179 |
+
|
180 |
+
## ๐ซ License
|
181 |
+
|
182 |
+
This project is released under the [Apache 2.0 license](LICENSE).
|
183 |
+
|
184 |
+
## ๐๏ธ Citation
|
185 |
+
|
186 |
+
If you find this project useful in your research, please cite our paper:
|
187 |
+
|
188 |
+
```BibTeX
|
189 |
+
@article{2023controlllm,
|
190 |
+
title={ControlLLM: Augment Language Models with Tools by Searching on Graphs},
|
191 |
+
author={Liu, Zhaoyang and Lai, Zeqiang and Gao Zhangwei and Cui, Erfei and Li, Zhiheng and Zhu, Xizhou and Lu, Lewei and Chen, Qifeng and Qiao, Yu and Dai, Jifeng and Wang Wenhai},
|
192 |
+
journal={arXiv preprint arXiv:2305.10601},
|
193 |
+
year={2023}
|
194 |
+
}
|
195 |
+
```
|
196 |
+
|
197 |
+
## ๐ค Acknowledgement
|
198 |
+
- Thanks to the open source of the following projects:
|
199 |
+
[Hugging Face](https://github.com/huggingface)  
|
200 |
+
[LangChain](https://github.com/hwchase17/langchain)  
|
201 |
+
[SAM](https://github.com/facebookresearch/segment-anything)  
|
202 |
+
[Stable Diffusion](https://github.com/CompVis/stable-diffusion)  
|
203 |
+
[ControlNet](https://github.com/lllyasviel/ControlNet)  
|
204 |
+
[InstructPix2Pix](https://github.com/timothybrooks/instruct-pix2pix)  
|
205 |
+
[EasyOCR](https://github.com/JaidedAI/EasyOCR) 
|
206 |
+
[ImageBind](https://github.com/facebookresearch/ImageBind)  
|
207 |
+
[PixArt-alpha](https://github.com/PixArt-alpha/PixArt-alpha)  
|
208 |
+
[LLaVA](https://github.com/haotian-liu/LLaVA?tab=readme-ov-file)  
|
209 |
+
[Modelscope](https://modelscope.cn/my/overview)  
|
210 |
+
[AudioCraft](https://github.com/facebookresearch/audiocraft)  
|
211 |
+
[Whisper](https://github.com/openai/whisper)  
|
212 |
+
[Llama 2](https://github.com/facebookresearch/llama)  
|
213 |
+
[LLaMA](https://github.com/facebookresearch/llama/tree/llama_v1) 
|
214 |
+
|
215 |
+
---
|
216 |
+
If you want to join our WeChat group, please scan the following QR Code to add our assistant as a Wechat friend:
|
217 |
+
<p align="center"><img width="300" alt="image" src="https://github.com/OpenGVLab/DragGAN/assets/26198430/e3f0807f-956a-474e-8fd2-1f7c22d73997"></p>
|
app.py
CHANGED
@@ -687,4 +687,4 @@ def app(controller="cllm.agents.tog.Controller", https=False, **kwargs):
|
|
687 |
|
688 |
if __name__ == "__main__":
|
689 |
os.makedirs(RESOURCE_ROOT, exist_ok=True)
|
690 |
-
app(controller="cllm.agents.tog.Controller"
|
|
|
687 |
|
688 |
if __name__ == "__main__":
|
689 |
os.makedirs(RESOURCE_ROOT, exist_ok=True)
|
690 |
+
app(controller="cllm.agents.tog.Controller")
|