llm-jp-13b-OpenWebMathInstruct_2_v1

開発停止 数学タスクにおける精度向上が見込めなかったため

Overview

This model is an instruction-tuned variant of llm-jp/llm-jp-3-13b-instruct, further fine-tuned on a subset of nvidia/OpenMathInstruct-2 with 172,800 samples. The fine-tuning process followed a parameter-efficient strategy, updating only selected layers while freezing most of the model parameters.

Key Features

  • Base Model: llm-jp/llm-jp-3-13b-instruct
  • Fine-Tuning Data: 172,800 samples from nvidia/OpenMathInstruct-2
  • Updated Parameters:
    • All parameters were frozen except for:
      for param in model.parameters():
          param.requires_grad = False
      
      for param in model.lm_head.parameters():
          param.requires_grad = True
      
  • Macro-o1 Tokens Added: To align with Marco-o1, we introduced the following special tokens:
    • <Thought>, </Thought>
    • <Output>, </Output>
  • Reasoning Model Integration: Uses the implementation from Hajime-Y/reasoning-model

Usage

Below is an example of using the model with Monte Carlo Tree Search (MCTS) for reasoning:

import sys
import torch
sys.path.append('./reasoning-model')  
from reasoning_model import ReasoningModelForCausalLM
from tree_utils import print_tree_with_best_path
from transformers import AutoTokenizer

# tokenizerとmodelの準備
model_name = "doshisha-mil/llm-jp-13b-OpenMathInstruct-2-v1"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
)

# パディングトークンを明示的に設定
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# モデルのロード
model = ReasoningModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# 入力テキスト
prompt = "Find the number of positive integers $x$ that satisfy $x^{-1}>x$."
text = f"あなたは優秀で論理的なアシスタントです。まずは<Thought></Thought>タグの中であなたの思考の過程を記載し、<Output></Output>タグの中に最終的にユーザーに提供する出力を記載します。\n\n### 指示: {prompt}\n\n### 応答: <Thought>\n"

# Tokenize with explicit attention_mask
model_inputs = tokenizer([text], return_tensors="pt", padding=True, truncation=True)
model_inputs["attention_mask"] = (model_inputs["input_ids"] != tokenizer.pad_token_id).long()

# デバイスをモデルのデバイスに統一
model_inputs = {key: val.to(model.device) for key, val in model_inputs.items()}

# MCTSを用いて生成
final_tokens, final_node = model.generate(
    input_ids=model_inputs["input_ids"],
    attention_mask=model_inputs["attention_mask"],  # 明示的に attention_mask を渡す
    iterations_per_step=3,
    max_iterations=30,
    mini_step_size=32,
    expand_threshold=0,
    step_separator_ids=None,
)

# 結果をテキスト化
final_text = tokenizer.decode(final_tokens, skip_special_tokens=True)
print("=== 最終生成テキスト ===")
print(final_text)

Model Applications

  • Mathematical problem-solving with structured reasoning
  • Chain-of-Thought (CoT) enhanced reasoning
  • Integration with Monte Carlo Tree Search (MCTS)
  • Instruction-based question answering

References

Citation

If you use this model, please cite the original base model and relevant datasets.

@article{llm-jp3-13b-instruct,
  title={LLM-JP 3-13B Instruct},
  author={LLM-JP Team},
  year={2024},
  journal={Hugging Face Repository},
  url={https://huggingface.co/llm-jp/llm-jp-3-13b-instruct}
}

@article{marco-o1,
  title={Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions},
  author={Yu Zhao, Huifeng Yin, Bo Zeng, Hao Wang, Tianqi Shi, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang},
  year={2024},
  journal={arXiv},
  eprint={2411.14405v1},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

License

Refer to the base model's license at llm-jp/llm-jp-3-13b-instruct for details.


This README provides clear documentation on how to use the model while crediting its sources. Let me know if you need modifications!

Downloads last month
17
Safetensors
Model size
13.7B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.