--- library_name: transformers tags: [] --- # Model Card for TwinDoc/RedWhale-2-12B-Instruct 사전학습 모델인 TwinDoc/RedWhale-2-12B를 SFT(Supervised Finetuning)한 모델입니다. SFT는 ContextQA 및 요약 task에 맞춰 학습하였습니다. ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Developed by:** AgileSoda - **Model type:** Llama - **Language(s) (NLP):** 한국어 - **License:** [More Information Needed] - **Foundation Model:** RedWhale-2-12B ### Model Sources [optional] - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use RedWhale-2-12B-Instruct 모델 사용 방법은 meta-llama/Llama-3.1-8B-Instruct 모델 사용 방법과 동일합니다. 사용하고자 하는 서빙 엔진의 공식 문서를 참고하세요. 다음은 예시입니다. **usage with Transformers** 예시 코드는 transformers == 4.48.1에서 작성되었습니다. ```python from transformers import AutoModelForCausalLM,AutoTokenizer import torch loading_args = {"torch_dtype": torch.bfloat16, "device_map": "auto"} ## for multi gpu loading model = AutoModelForCausalLM.from_pretrained("TwinDoc/RedWhale-2-12B-Instruct",**loading_args) tokenizer = AutoTokenizer.from_pretrained("TwinDoc/RedWhale-2-12B-Instruct") messages = [ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "대한민국의 수도는?"}, ] inputs = tokenizer.apply_chat_template(messages,add_generation_prompt=True,return_tensors="pt") outputs = model.generate(inputs) ``` ```python >>> print(outputs) "<|begin_of_text|><|start_header_id|>system<|end_header_id|> Cutting Knowledge Date: December 2023 Today Date: 26 Jul 2024 You are a helpful AI assistant.<|eot_id|><|start_header_id|>user<|end_header_id|> 대한민국의 수도는?<|eot_id|><|start_header_id|>assistant<|end_header_id|> 대한민국의 수도는 서울입니다.<|eot_id|>" ``` **usage with vllm** 예시 코드는 vllm == 0.6.6에서 작성되었습니다. ```python from vllm import LLM, SamplingParams import os os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # Arrange GPU devices starting from 0 os.environ["CUDA_VISIBLE_DEVICES"]= "0,1,2,3,4,5,6,7" repo_id = "TwinDoc/RedWhale-2-12B-Instruct" tensor_parallel_size = 8 ## num of gpus llm = LLM( model=repo_id, tensor_parallel_size=tensor_parallel_size, ) messages = [ {"role": "system", "content": "You are a helpful AI assistant."}, {"role": "user", "content": "대한민국의 수도는?"}, ] sampling_params = SamplingParams( temperature=0.8, top_p=0.9, max_tokens = 8192, ) outputs = llm.chat(messages,sampling_params) ``` ```python >>> print(outputs[0].outputs[0].text) 대한민국의 수도는 서울입니다. ``` ## Training Details ### Training Data - [dataset information](https://www.notion.so/agilesoda/Posttraining-Data-1209f036b307805294aae32aa16d2dd1) - [download dataset](https://huggingface.co/datasets/TwinDoc/dataset-sft-redwhale2-collections-12) ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data - [allganize/rag-ko](https://huggingface.co/datasets/allganize/rag-ko) test셋 200건 - 미래에셋 Context QA 100건 - AIA Context QA 140건 - BNK Context QA 63건 #### Metrics LLM as a Judge을 활용하여 성능을 측정하였습니다. Prompt,측정 모델 그리고 평가 결과는 Our Leaderboard를 참고하세요. [Our Leaderboard](https://www.notion.so/agilesoda/Our-LeaderBoard-17b9f036b30780c595a1e430ba13e9c6)의 모델명 "RedWhale2 12B 0.98 SFT v4 M"이 "TwinDoc/RedWhale-2-12B-Instruct" 모델입니다.