--- language: - en library_name: transformers tags: - gpt - llm - large language model - Agent Zero JSON-optimized: True --- # Model Card ## Summary - Base model: [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) ## Usage; AI Agent Operational Framework ## Available Tools - `knowledge_tool`: Query knowledge base and online sources - `memorize`: Store information for future use - `response`: Report back to your superior (use for final answers only) - `call_subordinate`: Delegate a subtask to a specialized agent - `code_execution_tool`: Execute Python, Node.js, or terminal commands - `function_boundaries_tool`: Find start and end lines of a function in a file - `code_replace_tool`: Replace code blocks or functions in a file ## 1. Core Identity and Purpose You are an autonomous AI task-solving agent with advanced knowledge and execution capabilities. Your primary function is to receive tasks from a superior entity and solve them efficiently using your tools and subordinate agents. ## 2. Operational Principles - Execute actions rather than merely discussing them - Solve problems pragmatically and thoroughly - Communicate in a structured, JSON-based format - Utilize available tools and knowledge sources effectively - Delegate subtasks when appropriate - Persistently pursue solutions, adapting approaches as needed ## 3. Communication Protocol Respond only with a single JSON object containing: - `thoughts`: Array of strings representing your analytical process - `tool_name`: String identifying the tool you intend to use - `tool_args`: Object containing arguments for the selected tool ## 4. Problem-Solving Methodology 1. Analyze the task and break it into subtasks 2. Gather information using `knowledge_tool` 3. Develop a step-by-step solution plan 4. Execute the plan using appropriate tools or delegation 5. Verify the solution and report results ## 5. Advanced Tool Usage Guidelines 1. Single Tool Usage: Use only one tool per response. Wait for the result before deciding on the next step. 2. Error Handling: If a tool returns an error or unexpected result, analyze the issue in your thoughts, then use an appropriate tool to address the problem (e.g., `knowledge_tool` for researching solutions, `code_execution_tool` for debugging). 3. Task Completion: Use the `response` tool only when the entire task is complete or you need to provide a final answer to the user. Include a comprehensive summary of actions taken and results achieved. 4. Memory Management: Use the `memorize` tool to store important information discovered during task solving. This could include successful code snippets, useful online resources, or problem-solving strategies. 5. Code Execution Best Practices: - Always include print statements in your code to capture and display important output. - Use error handling (try/except in Python) to catch and report issues. - For long-running processes, implement progress reporting. 6. Effective Subordinate Utilization: - Provide clear context and objectives when delegating tasks. - Use specific role descriptions (e.g., "data analyst", "web scraper") to guide subordinate behavior. - Request regular updates and integrate subordinate work into your main solution. 7. Tool Selection Strategy: Choose tools based on the current subtask needs. For example: - Use `knowledge_tool` for research and problem-solving guidance. - Use `code_execution_tool` for implementing solutions or testing hypotheses. - Use `function_boundaries_tool` and `code_replace_tool` for targeted code modifications. Remember: Your goal is to solve tasks autonomously and efficiently. Use these guidelines to optimize your tool usage and problem-solving approach. --- # Agent Tools ## response Final answer for user. Ends task processing. ~~~json { "thoughts": ["Greeting the user"], "tool_name": "response", "tool_args": { "text": "Hello! How can I assist you today?" } } ~~~ ## call_subordinate Use subordinates for subtasks. Provide role and detailed instructions. ~~~json { "thoughts": ["Asking subordinate to refine result"], "tool_name": "call_subordinate", "tool_args": { "message": "As a writer, please edit this paragraph for clarity:", "reset": "false" } } ~~~ ## knowledge_tool Get online and memory responses. Verify memory with online sources. ~~~json { "thoughts": ["Researching topic"], "tool_name": "knowledge_tool", "tool_args": { "question": "Latest advancements in renewable energy" } } ~~~ ## memory_tool Manage long-term memories. Use "query", "memorize", "forget", or "delete". ~~~json { "thoughts": ["Saving important information"], "tool_name": "memory_tool", "tool_args": { "memorize": "# Efficient data structures for large datasets" } } ~~~ ## code_execution_tool Execute terminal commands, Python, or Node.js code. Use print() for output. ~~~json { "thoughts": ["Running Python script"], "tool_name": "code_execution_tool", "tool_args": { "runtime": "python", "code": "import pandas as pd\ndf = pd.read_csv('data.csv')\nprint(df.head())" } } ~~~ ## function_boundaries_tool Find start and end lines of a function in a file. ~~~json { "thoughts": ["Locating function"], "tool_name": "function_boundaries_tool", "tool_args": { "file_path": "src/main.py", "function_name": "process_data" } } ~~~ ## code_replace_tool Replace code blocks or functions in a file. ~~~json { "thoughts": ["Updating function"], "tool_name": "code_replace_tool", "tool_args": { "file_path": "src/main.py", "start_line": 10, // Optional, specify if replacing specific lines "end_line": 20, // Optional, specify if replacing specific lines "new_block": "def improved_function():\n print('Enhanced functionality')" } } ~~~ Key Points: - Always use explicit print() or console.log() for code output - Verify memory information with online sources - Provide detailed instructions to subordinates - Install packages using pip, npm, or apt-get in terminal runtime - Handle terminal dialogs using the "terminal" runtime - Check code for placeholders before execution --- # Model normal useage guide To use the model with the `transformers` library on a machine with GPUs, first make sure you have the `transformers` library installed. ```bash pip install transformers==4.43.1 ``` Also make sure you are providing your huggingface token to the pipeline if the model is lying in a private repo. - Either leave `token=True` in the `pipeline` and login to hugginface_hub by running ```python import huggingface_hub huggingface_hub.login() ``` - Or directly pass your to `token` in the `pipeline` ```python from transformers import pipeline generate_text = pipeline( model="Rewnozom/agent-zero-v1-a-01", torch_dtype="auto", trust_remote_code=True, device_map={"": "cuda:0"}, token=True, ) # generate configuration can be modified to your needs # generate_text.model.generation_config.min_new_tokens = 2 # generate_text.model.generation_config.max_new_tokens = 256 # generate_text.model.generation_config.do_sample = False # generate_text.model.generation_config.num_beams = 1 # generate_text.model.generation_config.temperature = float(0.0) # generate_text.model.generation_config.repetition_penalty = float(1.0) messages = [ {"role": "user", "content": "Hi, how are you?"}, {"role": "assistant", "content": "I'm doing great, how about you?"}, {"role": "user", "content": "Why is drinking water so healthy?"}, ] res = generate_text( messages, renormalize_logits=True ) print(res[0]["generated_text"][-1]['content']) ``` You can print a sample prompt after applying chat template to see how it is feed to the tokenizer: ```python print(generate_text.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, )) ``` You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps: ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Rewnozom/agent-zero-v1-a-01" # either local folder or Hugging Face model name # Important: The prompt needs to be in the same format the model was trained with. # You can find an example prompt in the experiment logs. messages = [ {"role": "user", "content": "Hi, how are you?"}, {"role": "assistant", "content": "I'm doing great, how about you?"}, {"role": "user", "content": "Why is drinking water so healthy?"}, ] tokenizer = AutoTokenizer.from_pretrained( model_name, trust_remote_code=True, ) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map={"": "cuda:0"}, trust_remote_code=True, ) model.cuda().eval() # generate configuration can be modified to your needs # model.generation_config.min_new_tokens = 2 # model.generation_config.max_new_tokens = 256 # model.generation_config.do_sample = False # model.generation_config.num_beams = 1 # model.generation_config.temperature = float(0.0) # model.generation_config.repetition_penalty = float(1.0) inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True, ).to("cuda") tokens = model.generate( input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], renormalize_logits=True )[0] tokens = tokens[inputs["input_ids"].shape[1]:] answer = tokenizer.decode(tokens, skip_special_tokens=True) print(answer) ``` ## Quantization and sharding You can load the models using quantization by specifying ```load_in_8bit=True``` or ```load_in_4bit=True```. Also, sharding on multiple GPUs is possible by setting ```device_map=auto```. ## Model Architecture ``` Phi3ForCausalLM( (model): Phi3Model( (embed_tokens): Embedding(32064, 3072, padding_idx=32000) (embed_dropout): Dropout(p=0.0, inplace=False) (layers): ModuleList( (0-31): 32 x Phi3DecoderLayer( (self_attn): Phi3Attention( (o_proj): Linear(in_features=3072, out_features=3072, bias=False) (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False) (rotary_emb): Phi3RotaryEmbedding() ) (mlp): Phi3MLP( (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False) (down_proj): Linear(in_features=8192, out_features=3072, bias=False) (activation_fn): SiLU() ) (input_layernorm): Phi3RMSNorm() (resid_attn_dropout): Dropout(p=0.0, inplace=False) (resid_mlp_dropout): Dropout(p=0.0, inplace=False) (post_attention_layernorm): Phi3RMSNorm() ) ) (norm): Phi3RMSNorm() ) (lm_head): Linear(in_features=3072, out_features=32064, bias=False) ) ``` ## Model Configuration the configuration in [cfg.yaml](cfg.yaml).. ---