--- license: apache-2.0 language: - en - ja programming_language: - C - C++ - C# - Go - Java - JavaScript - Lua - PHP - Python - Ruby - Rust - Scala - TypeScript pipeline_tag: text-generation library_name: transformers inference: false tags: - mlx base_model: llm-jp/llm-jp-3-13b --- # niryuu/llm-jp-3-13b-ha The Model [niryuu/llm-jp-3-13b-ha](https://huggingface.co/niryuu/llm-jp-3-13b-ha) was converted to MLX format from [llm-jp/llm-jp-3-13b](https://huggingface.co/llm-jp/llm-jp-3-13b) using mlx-lm version **0.20.1**. It remains compatibility with HF Transformers. And then fine-tuned using LoRA with dataset: - h: kanhatakeyama/ramdom-to-fixed-multiturn-Calm3 - a: Aratako/Magpie-Tanuki-8B-97k ## Use for Evaluation ```python # -*- coding: utf-8 -*- !pip install -U bitsandbytes !pip install -U transformers !pip install -U accelerate !pip install -U datasets !pip install -U peft from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, ) from peft import PeftModel import torch from tqdm import tqdm import json # Hugging Faceで取得したTokenをこちらに貼る。 HF_TOKEN = "dummy" model_id = "niryuu/llm-jp-3-13b-ha" # QLoRA config bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, ) # Load model model = AutoModelForCausalLM.from_pretrained( model_id, quantization_config=bnb_config, device_map="auto", token = HF_TOKEN ) # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token = HF_TOKEN) # load dataset datasets = [] with open("./elyza-tasks-100-TV_0.jsonl", "r") as f: item = "" for line in f: line = line.strip() item += line if item.endswith("}"): datasets.append(json.loads(item)) item = "" results = [] for data in tqdm(datasets): input = data["input"] token_ids = tokenizer.apply_chat_template([{"role": "user", "content": input}], tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device) outputs = model.generate(input_ids, max_new_tokens=2048, do_sample=False, repetition_penalty=1.2,) output = tokenizer.decode(outputs[0][token_ids.size(1) :], skip_special_tokens=True) results.append({"task_id": data["task_id"], "input": input, "output": output}) # save outputs import re jsonl_id = re.sub(".*/", "", model_id) with open(f"./{jsonl_id}-outputs.jsonl", 'w', encoding='utf-8') as f: for result in results: json.dump(result, f, ensure_ascii=False) # ensure_ascii=False for handling non-ASCII characters f.write('\n') ``` ## Use with mlx ```bash pip install mlx-lm ``` ```python from mlx_lm import load, generate model, tokenizer = load("niryuu/llm-jp-3-13b-ha") prompt="hello" if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not None: messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) response = generate(model, tokenizer, prompt=prompt, verbose=True) ```