GGUF
Inference Endpoints
conversational
File size: 5,500 Bytes
27fcbb3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
---
license: apache-2.0
datasets:
- krasserm/gba-trajectories
---
A planner LLM [fine-tuned on synthetic trajectories](https://krasserm.github.io/2024/05/31/planner-fine-tuning/) from an agent simulation. It can be used in [ReAct](https://arxiv.org/abs/2210.03629)-style LLM agents where [planning is separated from function calling](https://krasserm.github.io/2024/03/06/modular-agent/). Trajectory generation and planner fine-tuning are described in the [bot-with-plan](https://github.com/krasserm/bot-with-plan) project.

The planner has been fine-tuned on the [krasserm/gba-trajectories](https://huggingface.co/datasets/krasserm/gba-trajectories) dataset with a [loss over the full sequence](https://github.com/krasserm/bot-with-plan/tree/master/train#gba-planner-7b-v02) (i.e. over prompt and completion). The original QLoRA model is available at [krasserm/gba-planner-7B-v0.2](https://huggingface.co/krasserm/gba-planner-7B-v0.2).

## Server setup

Download model:

```shell
mkdir -p models

wget https://huggingface.co/krasserm/gba-planner-7B-v0.2-GGUF/resolve/main/gba-planner-7B-v0.2-Q8_0.gguf?download=true \
  -O models/gba-planner-7B-v0.2-Q8_0.gguf
```

Start llama.cpp server:

```shell
docker run --gpus all --rm -p 8082:8080 -v $(realpath models):/models ghcr.io/ggerganov/llama.cpp:server-cuda--b1-17b291a \
  -m /models/gba-planner-7B-v0.2-Q8_0.gguf -c 1024 --n-gpu-layers 33 --host 0.0.0.0 --port 8080
```

## Usage example

Create a `planner` instance on the client side.

```python
import json
from gba.client import ChatClient, LlamaCppClient, MistralInstruct
from gba.planner import FineTunedPlanner
from gba.utils import Scratchpad

llm = LlamaCppClient(url="http://localhost:8082/completion")
model = MistralInstruct(llm=llm)
client = ChatClient(model=model)
planner = FineTunedPlanner(client=client)
```

Define a user `request` and past task-observation pairs (`scratchpad`) of the current trajectory.

```python
request = "Get the average Rotten Tomatoes scores for DreamWorks' last 5 movies."
scratchpad = Scratchpad()
scratchpad.add(
    task="Find the last 5 movies released by DreamWorks.", 
    result="The last five movies released by DreamWorks are \"The Bad Guys\" (2022), \"Boss Baby: Family Business\" (2021), \"Trolls World Tour\" (2020), \"Abominable\" (2019), and \"How to Train Your Dragon: The Hidden World\" (2019).")
scratchpad.add(
    task="Search the internet for the Rotten Tomatoes score of \"The Bad Guys\" (2022)", 
    result="The Rotten Tomatoes score of \"The Bad Guys\" (2022) is 88%.",
)
```

Then generate a plan for the next step in the trajectory. 

```python
result = planner.plan(request=request, scratchpad=scratchpad)
print(json.dumps(result.to_dict(), indent=2))
```

```json
{
  "context_information_summary": "The last five movies released by DreamWorks are \"The Bad Guys\" (2022), \"Boss Baby: Family Business\" (2021), \"Trolls World Tour\" (2020), \"Abominable\" (2019), and \"How to Train Your Dragon: The Hidden World\" (2019). The Rotten Tomatoes score of \"The Bad Guys\" (2022) is 88%.",
  "thoughts": "Since we already have the Rotten Tomatoes score for \"The Bad Guys\", the next logical step is to find the score for the second movie, \"Boss Baby: Family Business\". This will help us gradually build up the average score from the last five movies.",
  "task": "Search the internet for the Rotten Tomatoes score of \"Boss Baby: Family Business\" (2021).",
  "selected_tool": "search_internet"
}
```

The planner selects a tool and generates a task for the next step. The task is tool-specific and executed by the tool, in this case the [search_internet](https://github.com/krasserm/bot-with-plan/tree/master/gba/tools/search#search-internet-tool) tool, which results in the next observation on the trajectory. If the `final_answer` tool is selected, a final answer is available or can be generated from the trajectory. The output JSON schema is enforced by the `planner` via [constrained decoding](https://krasserm.github.io/2023/12/18/llm-json-mode/) on the llama.cpp server.

## Tools

The planner learned a (static) set of available tools during fine-tuning. These are:

| Tool name          | Tool description                                                                          |
|--------------------|-------------------------------------------------------------------------------------------|
| `ask_user`         | Useful for asking user about information missing in the request.                          |
| `calculate_number` | Useful for numerical tasks that result in a single number.                                |
| `create_event`     | Useful for adding a single entry to my calendar at given date and time.                   |
| `search_wikipedia` | Useful for searching factual information in Wikipedia.                                    |
| `search_internet`  | Useful for up-to-date information on the internet.                                        |
| `send_email`       | Useful for sending an email to a single recipient.                                        |
| `use_bash`         | Useful for executing commands in a Linux bash.                                            |
| `final_answer`     | Useful for providing the final answer to a request. Must always be used in the last step. |

The framework provided by the [bot-with-plan](https://github.com/krasserm/bot-with-plan) project can easily be adjusted to a different set of tools for specialization to other application domains.