Model Card for Model ID

Model Details

Reasoning more natural and faster maybe... Recommend system prompt like deepseek, added instruction into the system prompt can improving result

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

MAX_REASONING_TOKENS = 4096
MAX_RESPONSE_TOKENS = 1024

model_name = "beyoru/ThinkAgain1.2"

model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

messages = [
    {'role': 'system', 'content': """You are a helpful and harmless AI assistant.
A conversation between User and Assistant. The user asks a question, and the Assistant solves it.  
The assistant first thinks about the reasoning process in the mind and then provides the user with the answer.  
The reasoning process and answer are enclosed within `<think> </think>` and `<answer> </answer>`\
tags, respectively, i.e., `<think> reasoning process here </think>` `<answer> answer here </answer>`.  
User: **prompt**.  
Assistant:"""}
]

def stream_output(output_text):
    for char in output_text:
        print(char, end="", flush=True)

while True:
    prompt = input("USER: ")
    
    messages.append({"role": "user", "content": prompt})
    
    # Generate reasoning
    reasoning_template = tokenizer.apply_chat_template(messages, tokenize=False, add_reasoning_prompt=True)
    reasoning_inputs = tokenizer(reasoning_template, return_tensors="pt").to(model.device)
    reasoning_ids = model.generate(**reasoning_inputs, max_new_tokens=MAX_REASONING_TOKENS)
    reasoning_output = tokenizer.decode(reasoning_ids[0, reasoning_inputs.input_ids.shape[1]:], skip_special_tokens=True)
    
    messages.append({"role": "reasoning", "content": reasoning_output})
    
    print("REASONING: ", end="")
    stream_output(reasoning_output)
    print()
    
    # Generate answer
    response_template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    response_inputs = tokenizer(response_template, return_tensors="pt").to(model.device)
    response_ids = model.generate(**response_inputs, max_new_tokens=MAX_RESPONSE_TOKENS)
    response_output = tokenizer.decode(response_ids[0, response_inputs.input_ids.shape[1]:], skip_special_tokens=True)
    
    messages.append({"role": "assistant", "content": response_output})
    
    print("ASSISTANT: ", end="")
    stream_output(response_output)
    print()

Improvement:

  • Better performance in other language
  • Reduce hallucination in some case

Config:

update

Downloads last month
288
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for beyoru/ThinkAgain1.2

Quantizations
1 model

Collection including beyoru/ThinkAgain1.2