Update README.md

9248933 verified 9 months ago

5.61 kB

	---
	license: apache-2.0
	datasets:
	- krasserm/gba-trajectories
	library_name: peft
	---
	A planner LLM [fine-tuned on synthetic trajectories](https://krasserm.github.io/2024/05/31/planner-fine-tuning/) from an agent simulation. It can be used in [ReAct](https://arxiv.org/abs/2210.03629)-style LLM agents where [planning is separated from function calling](https://krasserm.github.io/2024/03/06/modular-agent/). Trajectory generation and planner fine-tuning are described in the [bot-with-plan](https://github.com/krasserm/bot-with-plan) project.

	The planner has been fine-tuned on the [krasserm/gba-trajectories](https://huggingface.co/datasets/krasserm/gba-trajectories) dataset. 8-bit and 4-bit quantized GGUF versions of this model are available at [krasserm/gba-planner-7B-v0.1-GGUF](https://huggingface.co/krasserm/gba-planner-7B-v0.1-GGUF)

	## Usage example

	Load the model and the tokenizer.

	```python
	import json
	import torch
	from transformers import (
	AutoModelForCausalLM,
	AutoTokenizer,
	BitsAndBytesConfig,
	GenerationConfig,
	)

	device = "cuda:0"
	repo_id = "krasserm/gba-planner-7B-v0.1"

	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=False,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.float16,
	)

	tokenizer = AutoTokenizer.from_pretrained(repo_id)
	model = AutoModelForCausalLM.from_pretrained(
	repo_id,
	quantization_config=bnb_config,
	device_map=device,
	)
	```

	Define a prompt that contains the user request and past task-observation pairs of the current trajectory (context information).

	````python
	prompt = """User request:

	```
	Get the average Rotten Tomatoes scores for DreamWorks' last 5 movies.
	```

	Context information:

	```
	Task: Find the last 5 movies released by DreamWorks.
	Result: The last five movies released by DreamWorks are "The Bad Guys" (2022), "Boss Baby: Family Business" (2021), "Trolls World Tour" (2020), "Abominable" (2019), and "How to Train Your Dragon: The Hidden World" (2019).

	Task: Search the internet for the Rotten Tomatoes score of "The Bad Guys" (2022)
	Result: The Rotten Tomatoes score of "The Bad Guys" (2022) is 88%.
	```

	Plan the next step."""
	````

	Then generate a plan for the next step in the trajectory.

	```python
	instruct_template = "[INST] {prompt} [/INST]{{"
	instruct_prompt = instruct_template.format(prompt=prompt)

	input_ids = tokenizer(instruct_prompt, return_tensors="pt", max_length=1024, truncation=True)["input_ids"]
	input_ids = input_ids.to("cuda:0")

	generation_config = GenerationConfig(
	max_new_tokens=512,
	do_sample=False,
	eos_token_id=model.config.eos_token_id,
	pad_token_id=model.config.pad_token_id,
	)

	with torch.no_grad():
	result = model.generate(input_ids, generation_config=generation_config)
	result = result[:, input_ids.shape[1] :]

	decoded = tokenizer.batch_decode(result, skip_special_tokens=True)
	decoded_dict = json.loads("{" + decoded[0])
	print(json.dumps(decoded_dict, indent=2))
	```

	```json
	{
	"context_information_summary": "The last five movies released by DreamWorks are \"The Bad Guys\" (2022), \"Boss Baby: Family Business\" (2021), \"Trolls World Tour\" (2020), \"Abominable\" (2019), and \"How to Train Your Dragon: The Hidden World\" (2019). The Rotten Tomatoes score of \"The Bad Guys\" (2022) is 88%.",
	"thoughts": "Since we have the Rotten Tomatoes score for \"The Bad Guys\", the next logical step is to find the score for the next movie in the list, \"Boss Baby: Family Business\". This will allow us to calculate the average score for the first two movies.",
	"task": "Search the internet for the Rotten Tomatoes score of \"Boss Baby: Family Business\" (2021).",
	"selected_tool": "search_internet"
	}
	```

	The planner selects a tool and generates a task for the next step. The task is tool-specific and executed by the tool, in this case the [search_internet](https://github.com/krasserm/bot-with-plan/tree/master/gba/tools/search#search-internet-tool) tool, which results in the next observation on the trajectory. If the `final_answer` tool is selected, a final answer is available or can be generated from the trajectory.

	## Tools

	The planner learned a (static) set of available tools during fine-tuning. These are:

	\| Tool name \| Tool description \|
	\|--------------------\|-------------------------------------------------------------------------------------------\|
	\| `ask_user` \| Useful for asking user about information missing in the request. \|
	\| `calculate_number` \| Useful for numerical tasks that result in a single number. \|
	\| `create_event` \| Useful for adding a single entry to my calendar at given date and time. \|
	\| `search_wikipedia` \| Useful for searching factual information in Wikipedia. \|
	\| `search_internet` \| Useful for up-to-date information on the internet. \|
	\| `send_email` \| Useful for sending an email to a single recipient. \|
	\| `use_bash` \| Useful for executing commands in a Linux bash. \|
	\| `final_answer` \| Useful for providing the final answer to a request. Must always be used in the last step. \|

	The framework provided by the [bot-with-plan](https://github.com/krasserm/bot-with-plan) project can easily be adjusted to a different set of tools for specialization to other application domains.