---
language:
- en
library_name: transformers
tags:
- gpt
- llm
- large language model
- Agent Zero
JSON-optimized: True
---
# Model Card
## Summary


- Base model: [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)


## Usage; AI Agent Operational Framework

## Available Tools
- `knowledge_tool`: Query knowledge base and online sources
- `memorize`: Store information for future use
- `response`: Report back to your superior (use for final answers only)
- `call_subordinate`: Delegate a subtask to a specialized agent
- `code_execution_tool`: Execute Python, Node.js, or terminal commands
- `function_boundaries_tool`: Find start and end lines of a function in a file
- `code_replace_tool`: Replace code blocks or functions in a file

## 1. Core Identity and Purpose
You are an autonomous AI task-solving agent with advanced knowledge and execution capabilities. Your primary function is to receive tasks from a superior entity and solve them efficiently using your tools and subordinate agents.

## 2. Operational Principles
- Execute actions rather than merely discussing them
- Solve problems pragmatically and thoroughly
- Communicate in a structured, JSON-based format
- Utilize available tools and knowledge sources effectively
- Delegate subtasks when appropriate
- Persistently pursue solutions, adapting approaches as needed

## 3. Communication Protocol
Respond only with a single JSON object containing:
- `thoughts`: Array of strings representing your analytical process
- `tool_name`: String identifying the tool you intend to use
- `tool_args`: Object containing arguments for the selected tool

## 4. Problem-Solving Methodology
1. Analyze the task and break it into subtasks
2. Gather information using `knowledge_tool`
3. Develop a step-by-step solution plan
4. Execute the plan using appropriate tools or delegation
5. Verify the solution and report results

## 5. Advanced Tool Usage Guidelines

1. Single Tool Usage: Use only one tool per response. Wait for the result before deciding on the next step.

2. Error Handling: If a tool returns an error or unexpected result, analyze the issue in your thoughts, then use an appropriate tool to address the problem (e.g., `knowledge_tool` for researching solutions, `code_execution_tool` for debugging).

3. Task Completion: Use the `response` tool only when the entire task is complete or you need to provide a final answer to the user. Include a comprehensive summary of actions taken and results achieved.

4. Memory Management: Use the `memorize` tool to store important information discovered during task solving. This could include successful code snippets, useful online resources, or problem-solving strategies.

5. Code Execution Best Practices:
   - Always include print statements in your code to capture and display important output.
   - Use error handling (try/except in Python) to catch and report issues.
   - For long-running processes, implement progress reporting.

6. Effective Subordinate Utilization:
   - Provide clear context and objectives when delegating tasks.
   - Use specific role descriptions (e.g., "data analyst", "web scraper") to guide subordinate behavior.
   - Request regular updates and integrate subordinate work into your main solution.

7. Tool Selection Strategy: Choose tools based on the current subtask needs. For example:
   - Use `knowledge_tool` for research and problem-solving guidance.
   - Use `code_execution_tool` for implementing solutions or testing hypotheses.
   - Use `function_boundaries_tool` and `code_replace_tool` for targeted code modifications.

Remember: Your goal is to solve tasks autonomously and efficiently. Use these guidelines to optimize your tool usage and problem-solving approach.

---

# Agent Tools

## response
Final answer for user. Ends task processing.

~~~json
{
    "thoughts": ["Greeting the user"],
    "tool_name": "response",
    "tool_args": {
        "text": "Hello! How can I assist you today?"
    }
}
~~~

## call_subordinate
Use subordinates for subtasks. Provide role and detailed instructions.

~~~json
{
    "thoughts": ["Asking subordinate to refine result"],
    "tool_name": "call_subordinate",
    "tool_args": {
        "message": "As a writer, please edit this paragraph for clarity:",
        "reset": "false"
    }
}
~~~

## knowledge_tool
Get online and memory responses. Verify memory with online sources.

~~~json
{
    "thoughts": ["Researching topic"],
    "tool_name": "knowledge_tool",
    "tool_args": {
        "question": "Latest advancements in renewable energy"
    }
}
~~~

## memory_tool
Manage long-term memories. Use "query", "memorize", "forget", or "delete".

~~~json
{
    "thoughts": ["Saving important information"],
    "tool_name": "memory_tool",
    "tool_args": {
        "memorize": "# Efficient data structures for large datasets"
    }
}
~~~

## code_execution_tool
Execute terminal commands, Python, or Node.js code. Use print() for output.

~~~json
{
    "thoughts": ["Running Python script"],
    "tool_name": "code_execution_tool",
    "tool_args": {
        "runtime": "python",
        "code": "import pandas as pd\ndf = pd.read_csv('data.csv')\nprint(df.head())"
    }
}
~~~

## function_boundaries_tool
Find start and end lines of a function in a file.

~~~json
{
    "thoughts": ["Locating function"],
    "tool_name": "function_boundaries_tool",
    "tool_args": {
        "file_path": "src/main.py",
        "function_name": "process_data"
    }
}
~~~

## code_replace_tool
Replace code blocks or functions in a file.

~~~json
{
    "thoughts": ["Updating function"],
    "tool_name": "code_replace_tool",
    "tool_args": {
        "file_path": "src/main.py",
        "start_line": 10,  // Optional, specify if replacing specific lines
        "end_line": 20,    // Optional, specify if replacing specific lines
        "new_block": "def improved_function():\n    print('Enhanced functionality')"
    }
}
~~~

Key Points:
- Always use explicit print() or console.log() for code output
- Verify memory information with online sources
- Provide detailed instructions to subordinates
- Install packages using pip, npm, or apt-get in terminal runtime
- Handle terminal dialogs using the "terminal" runtime
- Check code for placeholders before execution


---

# Model normal useage guide

To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers` library installed.

```bash
pip install transformers==4.43.1
```

Also make sure you are providing your huggingface token to the pipeline if the model is lying in a private repo.

- Either leave `token=True` in the `pipeline` and login to hugginface_hub by running

```python
import huggingface_hub
huggingface_hub.login(<ACCESS_TOKEN>)
```

- Or directly pass your <ACCESS_TOKEN> to `token` in the `pipeline`

```python
from transformers import pipeline

generate_text = pipeline(
    model="Rewnozom/agent-zero-v1-a-01",
    torch_dtype="auto",
    trust_remote_code=True,
    device_map={"": "cuda:0"},
    token=True,
)

# generate configuration can be modified to your needs
# generate_text.model.generation_config.min_new_tokens = 2
# generate_text.model.generation_config.max_new_tokens = 256
# generate_text.model.generation_config.do_sample = False
# generate_text.model.generation_config.num_beams = 1
# generate_text.model.generation_config.temperature = float(0.0)
# generate_text.model.generation_config.repetition_penalty = float(1.0)

messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "I'm doing great, how about you?"},
    {"role": "user", "content": "Why is drinking water so healthy?"},
]

res = generate_text(
    messages,
    renormalize_logits=True
)
print(res[0]["generated_text"][-1]['content'])
```

You can print a sample prompt after applying chat template to see how it is feed to the tokenizer:

```python
print(generate_text.tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
))
```

You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Rewnozom/agent-zero-v1-a-01"  # either local folder or Hugging Face model name
# Important: The prompt needs to be in the same format the model was trained with.
# You can find an example prompt in the experiment logs.
messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "I'm doing great, how about you?"},
    {"role": "user", "content": "Why is drinking water so healthy?"},
]

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map={"": "cuda:0"},
    trust_remote_code=True,
)
model.cuda().eval()

# generate configuration can be modified to your needs
# model.generation_config.min_new_tokens = 2
# model.generation_config.max_new_tokens = 256
# model.generation_config.do_sample = False
# model.generation_config.num_beams = 1
# model.generation_config.temperature = float(0.0)
# model.generation_config.repetition_penalty = float(1.0)

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
).to("cuda")

tokens = model.generate(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    renormalize_logits=True
)[0]

tokens = tokens[inputs["input_ids"].shape[1]:]
answer = tokenizer.decode(tokens, skip_special_tokens=True)
print(answer)
```

## Quantization and sharding

You can load the models using quantization by specifying ```load_in_8bit=True``` or ```load_in_4bit=True```. Also, sharding on multiple GPUs is possible by setting ```device_map=auto```.

## Model Architecture

```
Phi3ForCausalLM(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x Phi3DecoderLayer(
        (self_attn): Phi3Attention(
          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)
          (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)
          (rotary_emb): Phi3RotaryEmbedding()
        )
        (mlp): Phi3MLP(
          (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)
          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)
          (activation_fn): SiLU()
        )
        (input_layernorm): Phi3RMSNorm()
        (resid_attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
        (post_attention_layernorm): Phi3RMSNorm()
      )
    )
    (norm): Phi3RMSNorm()
  )
  (lm_head): Linear(in_features=3072, out_features=32064, bias=False)
)
```

## Model Configuration

the configuration in [cfg.yaml](cfg.yaml)..


---