GGUF
Inference Endpoints
conversational
krasserm commited on
Commit
27fcbb3
·
verified ·
1 Parent(s): acda6bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -3
README.md CHANGED
@@ -1,3 +1,91 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - krasserm/gba-trajectories
5
+ ---
6
+ A planner LLM [fine-tuned on synthetic trajectories](https://krasserm.github.io/2024/05/31/planner-fine-tuning/) from an agent simulation. It can be used in [ReAct](https://arxiv.org/abs/2210.03629)-style LLM agents where [planning is separated from function calling](https://krasserm.github.io/2024/03/06/modular-agent/). Trajectory generation and planner fine-tuning are described in the [bot-with-plan](https://github.com/krasserm/bot-with-plan) project.
7
+
8
+ The planner has been fine-tuned on the [krasserm/gba-trajectories](https://huggingface.co/datasets/krasserm/gba-trajectories) dataset with a [loss over the full sequence](https://github.com/krasserm/bot-with-plan/tree/master/train#gba-planner-7b-v02) (i.e. over prompt and completion). The original QLoRA model is available at [krasserm/gba-planner-7B-v0.2](https://huggingface.co/krasserm/gba-planner-7B-v0.2).
9
+
10
+ ## Server setup
11
+
12
+ Download model:
13
+
14
+ ```shell
15
+ mkdir -p models
16
+
17
+ wget https://huggingface.co/krasserm/gba-planner-7B-v0.2-GGUF/resolve/main/gba-planner-7B-v0.2-Q8_0.gguf?download=true \
18
+ -O models/gba-planner-7B-v0.2-Q8_0.gguf
19
+ ```
20
+
21
+ Start llama.cpp server:
22
+
23
+ ```shell
24
+ docker run --gpus all --rm -p 8082:8080 -v $(realpath models):/models ghcr.io/ggerganov/llama.cpp:server-cuda--b1-17b291a \
25
+ -m /models/gba-planner-7B-v0.2-Q8_0.gguf -c 1024 --n-gpu-layers 33 --host 0.0.0.0 --port 8080
26
+ ```
27
+
28
+ ## Usage example
29
+
30
+ Create a `planner` instance on the client side.
31
+
32
+ ```python
33
+ import json
34
+ from gba.client import ChatClient, LlamaCppClient, MistralInstruct
35
+ from gba.planner import FineTunedPlanner
36
+ from gba.utils import Scratchpad
37
+
38
+ llm = LlamaCppClient(url="http://localhost:8082/completion")
39
+ model = MistralInstruct(llm=llm)
40
+ client = ChatClient(model=model)
41
+ planner = FineTunedPlanner(client=client)
42
+ ```
43
+
44
+ Define a user `request` and past task-observation pairs (`scratchpad`) of the current trajectory.
45
+
46
+ ```python
47
+ request = "Get the average Rotten Tomatoes scores for DreamWorks' last 5 movies."
48
+ scratchpad = Scratchpad()
49
+ scratchpad.add(
50
+ task="Find the last 5 movies released by DreamWorks.",
51
+ result="The last five movies released by DreamWorks are \"The Bad Guys\" (2022), \"Boss Baby: Family Business\" (2021), \"Trolls World Tour\" (2020), \"Abominable\" (2019), and \"How to Train Your Dragon: The Hidden World\" (2019).")
52
+ scratchpad.add(
53
+ task="Search the internet for the Rotten Tomatoes score of \"The Bad Guys\" (2022)",
54
+ result="The Rotten Tomatoes score of \"The Bad Guys\" (2022) is 88%.",
55
+ )
56
+ ```
57
+
58
+ Then generate a plan for the next step in the trajectory.
59
+
60
+ ```python
61
+ result = planner.plan(request=request, scratchpad=scratchpad)
62
+ print(json.dumps(result.to_dict(), indent=2))
63
+ ```
64
+
65
+ ```json
66
+ {
67
+ "context_information_summary": "The last five movies released by DreamWorks are \"The Bad Guys\" (2022), \"Boss Baby: Family Business\" (2021), \"Trolls World Tour\" (2020), \"Abominable\" (2019), and \"How to Train Your Dragon: The Hidden World\" (2019). The Rotten Tomatoes score of \"The Bad Guys\" (2022) is 88%.",
68
+ "thoughts": "Since we already have the Rotten Tomatoes score for \"The Bad Guys\", the next logical step is to find the score for the second movie, \"Boss Baby: Family Business\". This will help us gradually build up the average score from the last five movies.",
69
+ "task": "Search the internet for the Rotten Tomatoes score of \"Boss Baby: Family Business\" (2021).",
70
+ "selected_tool": "search_internet"
71
+ }
72
+ ```
73
+
74
+ The planner selects a tool and generates a task for the next step. The task is tool-specific and executed by the tool, in this case the [search_internet](https://github.com/krasserm/bot-with-plan/tree/master/gba/tools/search#search-internet-tool) tool, which results in the next observation on the trajectory. If the `final_answer` tool is selected, a final answer is available or can be generated from the trajectory. The output JSON schema is enforced by the `planner` via [constrained decoding](https://krasserm.github.io/2023/12/18/llm-json-mode/) on the llama.cpp server.
75
+
76
+ ## Tools
77
+
78
+ The planner learned a (static) set of available tools during fine-tuning. These are:
79
+
80
+ | Tool name | Tool description |
81
+ |--------------------|-------------------------------------------------------------------------------------------|
82
+ | `ask_user` | Useful for asking user about information missing in the request. |
83
+ | `calculate_number` | Useful for numerical tasks that result in a single number. |
84
+ | `create_event` | Useful for adding a single entry to my calendar at given date and time. |
85
+ | `search_wikipedia` | Useful for searching factual information in Wikipedia. |
86
+ | `search_internet` | Useful for up-to-date information on the internet. |
87
+ | `send_email` | Useful for sending an email to a single recipient. |
88
+ | `use_bash` | Useful for executing commands in a Linux bash. |
89
+ | `final_answer` | Useful for providing the final answer to a request. Must always be used in the last step. |
90
+
91
+ The framework provided by the [bot-with-plan](https://github.com/krasserm/bot-with-plan) project can easily be adjusted to a different set of tools for specialization to other application domains.