gemma-7b non IT 버전 μ±„νŒ… 파인 νŠœλ‹λœ 버전

κ°„λ‹¨ν•œ μ±„νŒ… ν˜•νƒœμ˜ λ°μ΄ν„°λ‘œ 파인 νŠœλ‹λœ λ²„μ „μž…λ‹ˆλ‹€.

history

  • 0.1 : 2024-04-05 졜초 SFT버전 μ—…λ‘œλ“œ, DPOλŠ” κ³ λ―Ό 쀑

νŠΈλ ˆμ΄λ‹ 정보

  • μ‚¬μš©λ°μ΄ν„°μ…‹ : maywell/koVast 을 philschmid/gemma-tokenizer-chatml 에 맞게 λ³€μ‘°ν•˜μ—¬ μ‚¬μš©
  • GPU : RTX 3090 24G x 1
  • optimizer : adamw_torch
  • lr scheduler type : cosine
  • νŠΈλ ˆμ΄λ‹ μ‹œκ°„ : 140μ‹œκ°„
  • 에포크 : 1
  • train loss : 0.8991
  • eval loss : 0.7305

μ‚¬μš©λ²• (bfloat16, GPU λ©”λͺ¨λ¦¬ μ•½ 17κΈ°κ°€ ν•„μš”)

from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

checkpoint = "nmj21c/gemma-7b-andj-sft"
dtype = torch.bfloat16
model = AutoModelForCausalLM.from_pretrained(checkpoint, attn_implementation="flash_attention_2", device_map={"": 0}, torch_dtype=dtype)

toknizer_checkpoint = "philschmid/gemma-tokenizer-chatml"
tokenizer = AutoTokenizer.from_pretrained(toknizer_checkpoint)

chat = [
    {"role": "system", "content": ""},
    {"role": "user", "content": "μ„œμšΈμ˜ κ°•λ‚¨μ—­μ—μ„œ 맛집 μΆ”μ²œν•΄μ€˜"},   
]

prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

eos_token_str = "<|im_end|>"
eos_token = tokenizer(eos_token_str,add_special_tokens=False)["input_ids"][0]

inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt").to("cuda:0")
outputs = model.generate(
    input_ids=inputs.to(model.device), 
    max_new_tokens=1024,
    eos_token_id=eos_token,
    do_sample=True,
    temperature=0.7,
    top_k=50,
    top_p=0.95,
)

response = tokenizer.decode(outputs[0])[len(prompt):].strip().replace(eos_token_str, '')
print(response)
Downloads last month
4
Safetensors
Model size
8.54B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.