--- base_model: microsoft/Phi-3.5-mini-instruct language: - ko license: mit library_name: "peft" datasets: hecatonai/Housing_Subscription_QA_Dataset --- # Housing-Subscription-QA-Phi-3.5 ## Model Details ### Model Description - **Model type:** Question Answering - **Language(s) (NLP):** Korean - **Finetuned from model:** [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct) ### Model Sources ``` python from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline config = PeftConfig.from_pretrained("hecatonai/Housing-Subscription-QA-Phi-3.5") base_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3.5-mini-instruct", device_map='auto') model = PeftModel.from_pretrained(base_model, "hecatonai/Housing-Subscription-QA-Phi-3.5", device_map='auto') # 토크나이저 로드 tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct", device_map='auto') # 입력 텍스트 포맷팅 def apply_chat_template(question): template = "<|system|>\nYou are a helpful AI assistant. The default is 2024.<|end|>\n<|user|>\n{question}<|end|>\n<|assistant|>\n" return template.format(question=question) # 입력 텍스트 토크나이징 question = "투기과열지구 또는 청약과열지역에서 외국인 1순위 청약 가능?" input_text = apply_chat_template(question) inputs = tokenizer(input_text, return_tensors="pt") # 예측 수행 outputs = model.generate(**inputs, max_length=1000) # 출력 디코딩 decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=True) print(decoded_output) ``` ## Bias, Risks, and Limitations 해당 모델은 대한민국 국토교통부에서 발행한 [2022년도 및 2024년도 주택청약 FAQ](https://www.molit.go.kr/USR/policyData/m_34681/dtl.jsp?search=&srch_dept_nm=&srch_dept_id=&srch_usr_nm=&srch_usr_titl=Y&srch_usr_ctnt=&search_regdate_s=&search_regdate_e=&psize=10&s_category=&p_category=&lcmspage=1&id=4765)를 기반으로 Fine-Tune 한 LLM입니다. 따라서 해당 FAQ에 포함되지 않은 질문에 대해서는 부정확한 답변을 할 수 있으니 사용에 유의바랍니다. ## How to Get Started with the Model Use the code below to get started with the model. ``` python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer config = PeftConfig.from_pretrained("hecatonai/Housing-Subscription-QA-Phi-3.5") base_model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-3.5-mini-instruct", device_map='auto') model = PeftModel.from_pretrained(base_model, "hecatonai/Housing-Subscription-QA-Phi-3.5", device_map='auto') ``` Using with Pipeline ``` python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline model = AutoModelForCausalLM.from_pretrained("hecatonai/Housing-Subscription-QA-Phi-3.5", device_map='auto') pipe = pipeline("text-generation", model=model, tokenizer="microsoft/Phi-3.5-mini-instruct", torch_dtype=torch.bfloat16, device_map="auto") messages = [ {"role": "system", "content": "You are a helpful AI assistant. The default is 2024."}, {"role": "user", "content": "투기과열지구 및 청약과열지역 1순위 제한대상 누구?"} ] prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = pipe(prompt, renormalize_logits=True, max_new_tokens=512, do_sample=False) print(outputs[0]["generated_text"]) ``` Result ``` <|system|> You are a helpful AI assistant. The default is 2024.<|end|> <|user|> 투기과열지구 및 청약과열지역 1순위 제한대상 누구?<|end|> <|assistant|> 2024년 답변: 투기과열지구 및 청약과열지역에서 국민주택과 민영주택 1순위 제한 대상은, 과거 5년 이내에 본인 또는 세대원이 다른 주택의 당첨자가 된 경우입니다. ``` ## Training Details ### Training Data dataset: [Housing_Subscription_QA_Dataset](https://huggingface.co/datasets/hecatonai/Housing_Subscription_QA_Dataset) #### Training Hyperparameters This model following Hyperparameters were used during training: * bf16 = True * learning_rate = 5.0e-5 * num_train_epochs = 15 * per_device_batch_size = 4 * warmup_ratio = 0.2 #### Traning Prompt ``` python messages = [{"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": f"{example['question']}"}, {"role": "assistant", "content": f"{example['answer']}"}] ``` ### Framework versions - PEFT 0.12.0 - Transformers 4.44.2