PEFT
Safetensors

A planner LLM fine-tuned on synthetic trajectories from an agent simulation. It can be used in ReAct-style LLM agents where planning is separated from function calling. Trajectory generation and planner fine-tuning are described in the bot-with-plan project.

The planner has been fine-tuned on the krasserm/gba-trajectories dataset. 8-bit and 4-bit quantized GGUF versions of this model are available at krasserm/gba-planner-7B-v0.1-GGUF

Usage example

Load the model and the tokenizer.

import json
import torch
from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer, 
    BitsAndBytesConfig, 
    GenerationConfig,
)

device = "cuda:0"
repo_id = "krasserm/gba-planner-7B-v0.1"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    quantization_config=bnb_config,
    device_map=device,
)

Define a prompt that contains the user request and past task-observation pairs of the current trajectory (context information).

prompt = """User request:

```
Get the average Rotten Tomatoes scores for DreamWorks' last 5 movies.
```

Context information:

```
Task: Find the last 5 movies released by DreamWorks.
Result: The last five movies released by DreamWorks are "The Bad Guys" (2022), "Boss Baby: Family Business" (2021), "Trolls World Tour" (2020), "Abominable" (2019), and "How to Train Your Dragon: The Hidden World" (2019).

Task: Search the internet for the Rotten Tomatoes score of "The Bad Guys" (2022)
Result: The Rotten Tomatoes score of "The Bad Guys" (2022) is 88%.
```

Plan the next step."""

Then generate a plan for the next step in the trajectory.

instruct_template = "[INST] {prompt} [/INST]{{"
instruct_prompt = instruct_template.format(prompt=prompt)

input_ids = tokenizer(instruct_prompt, return_tensors="pt", max_length=1024, truncation=True)["input_ids"]
input_ids = input_ids.to("cuda:0")

generation_config = GenerationConfig(
    max_new_tokens=512,
    do_sample=False,
    eos_token_id=model.config.eos_token_id,
    pad_token_id=model.config.pad_token_id,
)

with torch.no_grad():
    result = model.generate(input_ids, generation_config=generation_config)
    result = result[:, input_ids.shape[1] :]

decoded = tokenizer.batch_decode(result, skip_special_tokens=True)
decoded_dict = json.loads("{" + decoded[0])
print(json.dumps(decoded_dict, indent=2))
{
    "context_information_summary": "The last five movies released by DreamWorks are \"The Bad Guys\" (2022), \"Boss Baby: Family Business\" (2021), \"Trolls World Tour\" (2020), \"Abominable\" (2019), and \"How to Train Your Dragon: The Hidden World\" (2019). The Rotten Tomatoes score of \"The Bad Guys\" (2022) is 88%.", 
    "thoughts": "Since we have the Rotten Tomatoes score for \"The Bad Guys\", the next logical step is to find the score for the next movie in the list, \"Boss Baby: Family Business\". This will allow us to calculate the average score for the first two movies.", 
    "task": "Search the internet for the Rotten Tomatoes score of \"Boss Baby: Family Business\" (2021).", 
    "selected_tool": "search_internet"
}

The planner selects a tool and generates a task for the next step. The task is tool-specific and executed by the tool, in this case the search_internet tool, which results in the next observation on the trajectory. If the final_answer tool is selected, a final answer is available or can be generated from the trajectory.

Tools

The planner learned a (static) set of available tools during fine-tuning. These are:

Tool name Tool description
ask_user Useful for asking user about information missing in the request.
calculate_number Useful for numerical tasks that result in a single number.
create_event Useful for adding a single entry to my calendar at given date and time.
search_wikipedia Useful for searching factual information in Wikipedia.
search_internet Useful for up-to-date information on the internet.
send_email Useful for sending an email to a single recipient.
use_bash Useful for executing commands in a Linux bash.
final_answer Useful for providing the final answer to a request. Must always be used in the last step.

The framework provided by the bot-with-plan project can easily be adjusted to a different set of tools for specialization to other application domains.

Downloads last month
10
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Dataset used to train krasserm/gba-planner-7B-v0.1

Collection including krasserm/gba-planner-7B-v0.1