metadata

base_model: aolans/gemma-2-9b_q4
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - gemma2
  - trl
license: gemma
language:
  - ja

Uploaded model

Developed by: aolans
License: gemma
Finetuned from model : aolans/gemma-2-9b_q4

Model

「日本語reasoningモデルを作る」を参考に

CoTデータによるファインチューニングで作成した「CoT reasoningモデル」です。

目的

ELYZA-tasks-100 （の亜種）に対する精度アップを目的としています。
（実用的ではないかもしれません。）

親モデル

Google/gemma-2-9b に対してUnslothで量子化（⇒ aolans/gemma-2-9b_q4）

Unsloth利用時に Unsloth/Gemma-2-9b(4bit量子化版) が適用されてしまうため
ローカルにGemma-2-9Bをダウンロードし、そちらをベースにしています。

データセット

まず、日本語学習の為、以下データセットでSFT実施。

CohereForAI/aya_dataset　　（※英語と日本語のデータのみ）
Kendamarron/jimba-instuction-1k-beta　　（※長文出力の為）

次にCoT対応の為、以下データセットでSFT実施。
difficulty =「very easy」or「easy」と、「medium」の一部を使用しています。

Kendamarron/Magpie-Tanuki-8B-CoT

参考資料

Usage

!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

from unsloth import FastLanguageModel
from huggingface_hub import hf_hub_download
import importlib.util

model_name = "aolans/gemma-2-9b-it-1e-cot_lora"

# *** モデル・トークナイザ生成（Unsloth使用）***
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name,
    dtype=None,
    load_in_4bit=True,
    trust_remote_code=True,
)

# *** 推論するためにモデルのモードを変更 ***
FastLanguageModel.for_inference(model)

# *** カスタム関数導入 ***
file_path = hf_hub_download(
    repo_id=model_name,
    filename="custom_functions.py",
)
spec = importlib.util.spec_from_file_location("custom_functions", file_path)
custom_functions = importlib.util.module_from_spec(spec)
spec.loader.exec_module(custom_functions)

# 質問内容
#   ※ユーザーからの質問内容をそのまま指定してください。
#     下記カスタム関数でシステムプロンプトを生成します。 
input = "1から10までの整数を足すと？"

# *** 推論 ***（
# カスタム関数 「generate_cot_two」
#     CoTプロンプトを利用して回答精度を上げる。推論を「思考過程」と「回答」の2回に分けて実施。
#     返却値を「出力全体」「思考過程」と「回答」の3つに分けて出力
output, answers, thought = custom_functions.generate_cot_two( model, tokenizer, input )

print(output)

カスタム関数

システムプロンプトの作成や不要な文字列（タブ）の除去を行うため
以下ファイルに自作関数を納めています。
custom_functions.py

推論時は用途に応じて以下を使い分けてください

- CoT推論「generate_cot_one」

generate_cot_one

#     CoTプロンプトを利用して回答精度を上げる。回答に「思考の過程」を含む
output = custom_functions.generate_cot_one( model, tokenizer, input )

- 時間をかけて2段階CoT推論「generate_cot_two」

generate_cot_two

#     CoTプロンプトを利用して回答精度を上げる。推論を「思考過程」と「回答」の2回に分けて実施。
#     返却値を「出力全体」「思考過程」と「回答」の3つに分けて出力
output, answers, thought = custom_functions.generate_cot_two( model, tokenizer, input )

- 通常の推論「generate_simple」

generate_simple

#     ごく単純なシステムプロンプトで回答する。
output = custom_functions.generate_simple( model, tokenizer, input )

Usage - 2

スミマセン、タイムオーバーしていることを承知で以下追記します。

elyza-tasks-100-TV の実行処理

# 必要なライブラリをインストール
!pip install unsloth
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install -U torch
!pip install -U peft

# 必要なライブラリを読み込み
from unsloth import FastLanguageModel
from peft import PeftModel
import torch
import json
from tqdm import tqdm
import re

from huggingface_hub import hf_hub_download
import importlib.util

# ベースとなるモデルと学習したLoRAのアダプタ（Hugging FaceのIDを指定）。
model_id = "aolans/gemma-2-9b-it-1e-cot_lora"

# unslothのFastLanguageModelで元のモデルをロード。
dtype = None # Noneにしておけば自動で設定
load_in_4bit = True # 今回は13Bモデルを扱うためTrue

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_id,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
    trust_remote_code=True,
)


# *** カスタム関数導入 ***
file_path = hf_hub_download(
    repo_id=model_id,
    filename="custom_functions.py",
    # token=HF_TOKEN
)
spec = importlib.util.spec_from_file_location("custom_functions", file_path)
custom_functions = importlib.util.module_from_spec(spec)
spec.loader.exec_module(custom_functions)


# 推論するためにモデルのモードを変更
FastLanguageModel.for_inference(model)

# タスクとなるデータの読み込み。
# 事前にデータをアップロードしてください。
datasets = []
with open("./elyza-tasks-100-TV_0.jsonl", "r") as f:
    item = ""
    for line in f:
      line = line.strip()
      item += line
      if item.endswith("}"):
        datasets.append(json.loads(item))
        item = ""

# モデルを用いてタスクの推論。
results = []
for dt in tqdm(datasets):
  input = dt["input"]

  # *** 推論 ***（
  # カスタム関数 「generate_cot_two」
  #     CoTプロンプトを利用して回答精度を上げる。推論を「思考過程」と「回答」の2回に分けて実施。
  #     返却値を「出力全体」「思考過程」と「回答」の3つに分けて出力
  output, _, _ = custom_functions.generate_cot_two_j( model, tokenizer, input )

  results.append({"task_id": dt["task_id"], "input": input, "output": output})

# こちらで生成されたjsolを提出してください。
# 本コードではinputとeval_aspectも含んでいますが、なくても問題ありません。
# 必須なのはtask_idとoutputとなります。
import re
jsonl_id = re.sub(".*/", "", adapter_id)
with open(f"./{jsonl_id}-outputs.jsonl", 'w', encoding='utf-8') as f:
    for result in results:
        json.dump(result, f, ensure_ascii=False)  # ensure_ascii=False for handling non-ASCII characters
        f.write('\n')

This gemma2 model was trained 2x faster with Unsloth and Huggingface's TRL library.