import json import os import torch import random import numpy as np from tqdm import tqdm from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig # === ์‹œ๋“œ ๊ณ ์ • === seed = 42 torch.manual_seed(seed) random.seed(seed) np.random.seed(seed) if torch.cuda.is_available(): torch.cuda.manual_seed_all(seed) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False print(f"๐Ÿ”’ ์‹œ๋“œ ๊ณ ์ • ์™„๋ฃŒ (seed={seed})") # === ์„ค์ • === fp16_ckpt = "sooh098/midmb-kculture-qa" test_json_path = "../data/korean_culture_qa_V1.0_test+.json" test_jsonl_path = "../data/test.jsonl" output_json_path = "../data/output.json" use_cuda = torch.cuda.is_available() # === 4bit ์„ค์ • === bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16 ) # === ๋ชจ๋ธ ๋ฐ ํ† ํฌ๋‚˜์ด์ € ๋กœ๋”ฉ === print("๐Ÿš€ 4bit ๋ชจ๋ธ ๋กœ๋“œ ์ค‘...") tokenizer = AutoTokenizer.from_pretrained( fp16_ckpt, trust_remote_code=True, use_fast=True ) model = AutoModelForCausalLM.from_pretrained( fp16_ckpt, quantization_config=bnb_config, device_map={"": 0}, torch_dtype=torch.float16 if use_cuda else torch.float32, trust_remote_code=True ) model.eval() print("โœ… 4bit ๋ชจ๋ธ ๋กœ๋“œ ์™„๋ฃŒ") # === ์˜ˆ์ธก ์‹œ์ž‘ === predictions = [] with open(test_jsonl_path, "r", encoding="utf-8") as f: lines = f.readlines() for i, line in enumerate(tqdm(lines, desc="๐Ÿ” ์˜ˆ์ธก ์ƒ์„ฑ ์ค‘", unit="์ƒ˜ํ”Œ")): sample = json.loads(line) input_text = sample["input"] instruction = sample.get("instruction", "") # Midm ํ”„๋กฌํ”„ํŠธ ํฌ๋งท prompt = ( "<|begin_of_text|>\n" f"<|start_header_id|>system<|end_header_id|>\n{instruction}\n" "<|eot_id|>\n" f"<|start_header_id|>user<|end_header_id|>\n{input_text}\n" "<|eot_id|>\n" "<|start_header_id|>assistant<|end_header_id|>\n" ) inputs = tokenizer(prompt, return_tensors="pt").to(model.device) if "token_type_ids" in inputs: del inputs["token_type_ids"] with torch.no_grad(): # ์‹œ๋“œ ๊ณ ์ • (๋งค๋ฒˆ ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•ด) torch.manual_seed(seed) output_ids = model.generate( **inputs, max_new_tokens=256, repetition_penalty=1.05, temperature=0.28, top_p=0.85, top_k=20, do_sample=True, eos_token_id=tokenizer.eos_token_id, pad_token_id=tokenizer.pad_token_id ) output_text = tokenizer.decode(output_ids[0], skip_special_tokens=False) raw_response = output_text.split("<|start_header_id|>assistant<|end_header_id|>\n")[-1] prediction = raw_response.split("<|end_of_text|>")[0].strip() predictions.append(prediction) print(f"\n๐Ÿ“ ์ƒ˜ํ”Œ {i + 1}") print(f"๐Ÿค– ์˜ˆ์ธก: {prediction}") # === ์›๋ณธ JSON ๋ถˆ๋Ÿฌ์˜ค๊ธฐ === with open(test_json_path, "r", encoding="utf-8") as f: test_data = json.load(f) # === ์˜ˆ์ธก ๊ฒฐ๊ณผ ์ถ”๊ฐ€ === for i, item in enumerate(test_data): answer = predictions[i] if i < len(predictions) else "" item["output"] = {"answer": answer} # === ๊ฒฐ๊ณผ ์ €์žฅ === os.makedirs(os.path.dirname(output_json_path), exist_ok=True) with open(output_json_path, "w", encoding="utf-8") as f: json.dump(test_data, f, ensure_ascii=False, indent=2) print(f"\nโœ… ์ตœ์ข… ๊ฒฐ๊ณผ๊ฐ€ '{output_json_path}'์— ์ €์žฅ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.")