Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,91 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- krasserm/gba-trajectories
|
5 |
+
---
|
6 |
+
A planner LLM [fine-tuned on synthetic trajectories](https://krasserm.github.io/2024/05/31/planner-fine-tuning/) from an agent simulation. It can be used in [ReAct](https://arxiv.org/abs/2210.03629)-style LLM agents where [planning is separated from function calling](https://krasserm.github.io/2024/03/06/modular-agent/). Trajectory generation and planner fine-tuning are described in the [bot-with-plan](https://github.com/krasserm/bot-with-plan) project.
|
7 |
+
|
8 |
+
The planner has been fine-tuned on the [krasserm/gba-trajectories](https://huggingface.co/datasets/krasserm/gba-trajectories) dataset with a [loss over the full sequence](https://github.com/krasserm/bot-with-plan/tree/master/train#gba-planner-7b-v02) (i.e. over prompt and completion). The original QLoRA model is available at [krasserm/gba-planner-7B-v0.2](https://huggingface.co/krasserm/gba-planner-7B-v0.2).
|
9 |
+
|
10 |
+
## Server setup
|
11 |
+
|
12 |
+
Download model:
|
13 |
+
|
14 |
+
```shell
|
15 |
+
mkdir -p models
|
16 |
+
|
17 |
+
wget https://huggingface.co/krasserm/gba-planner-7B-v0.2-GGUF/resolve/main/gba-planner-7B-v0.2-Q8_0.gguf?download=true \
|
18 |
+
-O models/gba-planner-7B-v0.2-Q8_0.gguf
|
19 |
+
```
|
20 |
+
|
21 |
+
Start llama.cpp server:
|
22 |
+
|
23 |
+
```shell
|
24 |
+
docker run --gpus all --rm -p 8082:8080 -v $(realpath models):/models ghcr.io/ggerganov/llama.cpp:server-cuda--b1-17b291a \
|
25 |
+
-m /models/gba-planner-7B-v0.2-Q8_0.gguf -c 1024 --n-gpu-layers 33 --host 0.0.0.0 --port 8080
|
26 |
+
```
|
27 |
+
|
28 |
+
## Usage example
|
29 |
+
|
30 |
+
Create a `planner` instance on the client side.
|
31 |
+
|
32 |
+
```python
|
33 |
+
import json
|
34 |
+
from gba.client import ChatClient, LlamaCppClient, MistralInstruct
|
35 |
+
from gba.planner import FineTunedPlanner
|
36 |
+
from gba.utils import Scratchpad
|
37 |
+
|
38 |
+
llm = LlamaCppClient(url="http://localhost:8082/completion")
|
39 |
+
model = MistralInstruct(llm=llm)
|
40 |
+
client = ChatClient(model=model)
|
41 |
+
planner = FineTunedPlanner(client=client)
|
42 |
+
```
|
43 |
+
|
44 |
+
Define a user `request` and past task-observation pairs (`scratchpad`) of the current trajectory.
|
45 |
+
|
46 |
+
```python
|
47 |
+
request = "Get the average Rotten Tomatoes scores for DreamWorks' last 5 movies."
|
48 |
+
scratchpad = Scratchpad()
|
49 |
+
scratchpad.add(
|
50 |
+
task="Find the last 5 movies released by DreamWorks.",
|
51 |
+
result="The last five movies released by DreamWorks are \"The Bad Guys\" (2022), \"Boss Baby: Family Business\" (2021), \"Trolls World Tour\" (2020), \"Abominable\" (2019), and \"How to Train Your Dragon: The Hidden World\" (2019).")
|
52 |
+
scratchpad.add(
|
53 |
+
task="Search the internet for the Rotten Tomatoes score of \"The Bad Guys\" (2022)",
|
54 |
+
result="The Rotten Tomatoes score of \"The Bad Guys\" (2022) is 88%.",
|
55 |
+
)
|
56 |
+
```
|
57 |
+
|
58 |
+
Then generate a plan for the next step in the trajectory.
|
59 |
+
|
60 |
+
```python
|
61 |
+
result = planner.plan(request=request, scratchpad=scratchpad)
|
62 |
+
print(json.dumps(result.to_dict(), indent=2))
|
63 |
+
```
|
64 |
+
|
65 |
+
```json
|
66 |
+
{
|
67 |
+
"context_information_summary": "The last five movies released by DreamWorks are \"The Bad Guys\" (2022), \"Boss Baby: Family Business\" (2021), \"Trolls World Tour\" (2020), \"Abominable\" (2019), and \"How to Train Your Dragon: The Hidden World\" (2019). The Rotten Tomatoes score of \"The Bad Guys\" (2022) is 88%.",
|
68 |
+
"thoughts": "Since we already have the Rotten Tomatoes score for \"The Bad Guys\", the next logical step is to find the score for the second movie, \"Boss Baby: Family Business\". This will help us gradually build up the average score from the last five movies.",
|
69 |
+
"task": "Search the internet for the Rotten Tomatoes score of \"Boss Baby: Family Business\" (2021).",
|
70 |
+
"selected_tool": "search_internet"
|
71 |
+
}
|
72 |
+
```
|
73 |
+
|
74 |
+
The planner selects a tool and generates a task for the next step. The task is tool-specific and executed by the tool, in this case the [search_internet](https://github.com/krasserm/bot-with-plan/tree/master/gba/tools/search#search-internet-tool) tool, which results in the next observation on the trajectory. If the `final_answer` tool is selected, a final answer is available or can be generated from the trajectory. The output JSON schema is enforced by the `planner` via [constrained decoding](https://krasserm.github.io/2023/12/18/llm-json-mode/) on the llama.cpp server.
|
75 |
+
|
76 |
+
## Tools
|
77 |
+
|
78 |
+
The planner learned a (static) set of available tools during fine-tuning. These are:
|
79 |
+
|
80 |
+
| Tool name | Tool description |
|
81 |
+
|--------------------|-------------------------------------------------------------------------------------------|
|
82 |
+
| `ask_user` | Useful for asking user about information missing in the request. |
|
83 |
+
| `calculate_number` | Useful for numerical tasks that result in a single number. |
|
84 |
+
| `create_event` | Useful for adding a single entry to my calendar at given date and time. |
|
85 |
+
| `search_wikipedia` | Useful for searching factual information in Wikipedia. |
|
86 |
+
| `search_internet` | Useful for up-to-date information on the internet. |
|
87 |
+
| `send_email` | Useful for sending an email to a single recipient. |
|
88 |
+
| `use_bash` | Useful for executing commands in a Linux bash. |
|
89 |
+
| `final_answer` | Useful for providing the final answer to a request. Must always be used in the last step. |
|
90 |
+
|
91 |
+
The framework provided by the [bot-with-plan](https://github.com/krasserm/bot-with-plan) project can easily be adjusted to a different set of tools for specialization to other application domains.
|