--- license: apache-2.0 library_name: transformers inference: false tags: - mlx base_model: AIDC-AI/Marco-o1 --- # fastx-ai/Marco-o1-int-4 The Model [fastx-ai/Marco-o1-int-4](https://huggingface.co/fastx-ai/Marco-o1-int-4) was converted to MLX format from [AIDC-AI/Marco-o1](https://huggingface.co/AIDC-AI/Marco-o1) using mlx-lm version **0.20.1**. ```python prompt="hello, can you teach me why 2 + 4 = 6 ?" ``` ```shell ========== Prompt: <|im_start|>system 你是一个经过良好训练的AI助手,你的名字是Marco-o1. ## 重要!!!!! 当你回答问题时,你的思考应该在内完成,内输出你的结果。 应该尽可能是英文,但是有2个特例,一个是对原文中的引用,另一个是是数学应该使用markdown格式,内的输出需要遵循用户输入的语言。 <|im_end|> <|im_start|>user hello, can you teach me why 2 + 4 = 6 ?<|im_end|> <|im_start|>assistant Alright, I need to explain why 2 plus 4 equals 6. Let's start by recalling the basic principles of addition. Addition is the process of combining two or more numbers to find their total. So, when we add 2 and 4, we're essentially combining two quantities. First, let's visualize this. Imagine you have 2 apples and someone gives you 4 more apples. Now, how many apples do you have in total? Counting them out ========== Prompt: 118 tokens, 698.640 tokens-per-sec Generation: 100 tokens, 103.937 tokens-per-sec Peak memory: 4.386 GB ``` ## Use with mlx ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("fastx-ai/Marco-o1-1.2B-mlx-int4") prompt="hello" if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) response = generate(model, tokenizer, prompt=prompt, verbose=True) ``` ## Change system prompt ... 1. clone this repo to local 2. change tokenizer_config.json ```json "chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\n\n你是一个经过良好训练的AI助手,你的名字是Marco-o1.\n \n## 重要!!!!!\n当你回答问题时,你的思考应该在内完成,内输出你的结果。\n应该尽可能是英文,但是有2个特例,一个是对原文中的引用,另一个是是数学应该使用markdown格式,内的输出需要遵循用户输入的语言。\n <|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}", ``` 3. load ```python from mlx_lm import load, generate model, tokenizer = load("./mlx_model") # notice: folder where you put this repo files. prompt="hello, can you teach me why 2 + 4 = 6 ?" if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) response = generate(model, tokenizer, prompt=prompt, verbose=True) ```