File size: 6,802 Bytes
861e510
 
 
 
 
 
 
 
 
 
 
 
 
b32d857
81de500
51ed388
1630c51
81de500
 
 
 
 
 
 
 
1630c51
b41fe52
d70d0f5
b32d857
81de500
 
 
 
b32d857
81de500
 
 
b32d857
81de500
51ed388
 
b32d857
b41fe52
d70d0f5
07b624f
 
 
d70d0f5
b32d857
81de500
51ed388
81de500
b32d857
81de500
b32d857
81de500
e1650ed
d79571a
e1650ed
 
d79571a
 
 
 
 
 
 
 
5d07ae0
 
87fc54a
5d07ae0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81de500
 
 
 
 
 
 
 
 
 
b32d857
ce5e53f
60ae85e
 
 
 
ce5e53f
b32d857
 
ce5e53f
60ae85e
 
 
 
ce5e53f
b32d857
 
81de500
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
---
license: mit
language:
- en
- ko
metrics:
- accuracy
base_model:
- meta-llama/Llama-3.1-8B-Instruct
library_name: transformers
tags:
- medical
- healthcare
---
# LLaMA 3.1 8B Instruct - Healthcare Fine-tuned Model
This is a model that fine-tuned the Llama-3.1-8B-Instruct model from Unidocs using Healthcare data.<br>
μœ λ‹ˆλ‹₯슀(μ£Ό)μ—μ„œ Llama-3.1-8B-Instruct λͺ¨λΈμ„ Healthcare λ°μ΄ν„°λ‘œ λ―Έμ„Έμ‘°μ •ν•œ λͺ¨λΈμž„ <br>

## Model Description
sLLM model used in Unidoc's ezMyAIDoctor, released on October 16, 2024 as a result of the AIDC-HPC project <br>
of the Artificial Intelligence Industry Convergence Business Group (AICA) <br>
meta-llama/Llama-3.1-8B-Instruct wiki, kowiki, super-large AI healthcare question-answer data, <br>
A model that has been pretrained (Full Finetuning) by referring to the super-large AI corpus with improved Korean performance, <br>
and the medical and legal professional book corpus.

μœ λ‹ˆλ‹₯슀(μ£Ό)의 ezMyAIDoctorμ—μ„œ μ‚¬μš©λ˜λŠ” sLLM λͺ¨λΈλ‘œ 인곡지λŠ₯μ‚°μ—…μœ΅ν•©μ‚¬μ—…λ‹¨(AICA)의 AIDC-HPC μ‚¬μ—…μ˜ 결과둜 2024λ…„ 10μ›” 16일 κ³΅κ°œν•¨<br>
meta-llama/Llama-3.1-8B-Instruct에 wiki, kowiki, AIHub(aihub.or.kr)의 (μ΄ˆκ±°λŒ€AI ν—¬μŠ€μΌ€μ–΄ μ§ˆμ˜μ‘λ‹΅λ°μ΄ν„°, 
ν•œκ΅­μ–΄ μ„±λŠ₯이 κ°œμ„ λœ μ΄ˆκ±°λŒ€ AI λ§λ­‰μΉ˜, 의료/법λ₯  μ „λ¬Έμ„œμ  λ§λ­‰μΉ˜)λ₯Ό μ°Έκ³ ν•˜μ—¬ Pretrain(Full Finetuning)된 λͺ¨λΈμž„

## Intended Uses & Limitations
The model is designed to assist with healthcare-related queries and tasks. <br>
However, it should not be used as a substitute for professional medical advice, diagnosis, or treatment.<br>
Always consult with a qualified healthcare provider for medical concerns.

이 λͺ¨λΈμ€ Healthcare κ΄€λ ¨ 질의 및 μž‘μ—…μ„ μ§€μ›ν•˜λ„λ‘ μ„€κ³„λ˜μ—ˆμŠ΅λ‹ˆλ‹€. <br>
κ·ΈλŸ¬λ‚˜ 전문적인 μ˜ν•™μ  μ‘°μ–Έ, 진단 λ˜λŠ” 치료λ₯Ό λŒ€μ²΄ν•˜λŠ” 데 μ‚¬μš©λ˜μ–΄μ„œλŠ” μ•ˆ λ©λ‹ˆλ‹€. <br>
의료 κ΄€λ ¨ λ¬Έμ œλŠ” 항상 μžκ²©μ„ κ°–μΆ˜ 의료 μ„œλΉ„μŠ€ μ œκ³΅μžμ™€ μƒμ˜ν•˜μ‹­μ‹œμ˜€.

## Training Data
The model was fine-tuned on a proprietary healthcare dataset. <br>
Due to privacy concerns, details of the dataset cannot be disclosed.<br>

wiki, kowiki 데이터 이외<br>
κ³Όν•™κΈ°μˆ μ •λ³΄ν†΅μ‹ λΆ€, ν•œκ΅­μ§€λŠ₯μ •λ³΄μ‚¬νšŒμ§„ν₯μ›μ—μ„œ κ΄€λ¦¬ν•˜κ³  μžˆλŠ” AIHub의 <br>
- μ΄ˆκ±°λŒ€AI ν—¬μŠ€μΌ€μ–΄ μ§ˆμ˜μ‘λ‹΅λ°μ΄ν„°
- ν•œκ΅­μ–΄ μ„±λŠ₯이 κ°œμ„ λœ μ΄ˆκ±°λŒ€ AI λ§λ­‰μΉ˜
- 의료, 법λ₯  μ „λ¬Έμ„œμ  λ§λ­‰μΉ˜
<br> 등을 ν™œμš©ν•¨

## Training Procedure
Full fine-tuning was performed on the base LLaMA 3.1 8B Instruct model using the healthcare dataset.<br>
Healthcare 데이터 μ„ΈνŠΈλ₯Ό μ‚¬μš©ν•˜μ—¬ κΈ°λ³Έ LLaMA 3.1 8B Instruct λͺ¨λΈμ—μ„œ 전체 λ―Έμ„Έ 쑰정을 μˆ˜ν–‰ν–ˆμŠ΅λ‹ˆλ‹€.

## Evaluation  Results

Accuracy by category of mmlu benchmark<br>

|category| Accuracy|
|-------------------|--------------|
|anatomy            | 0.68 (92/135)|
|clinical_knowledge | 0.75 (200/265)|
|college_medicine | 0.68 (117/173)|
|medical_genetics | 0.70 (70/100)|
|professional_medicine | 0.76 (208/272)|

All Accuracy Mean value: 0.72 


### Use with transformers

Starting with `transformers >= 4.43.1` onward, you can run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.

Make sure to update your transformers installation via `pip install --upgrade transformers`.

```python
import transformers
import torch

model_id = "unidocs/llama-3.1-8b-komedic-instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

messages = [
    {"role": "system", "content": "당신은 μ˜λ£Œμ „λ¬Έκ°€μž…λ‹ˆλ‹€. μ§ˆλ³‘μ˜ μ •μ˜, 원인, 증상, 검진, 진단, 치료, μ•½λ¬Ό, 식이, μƒν™œ μΈ‘λ©΄μ—μ„œ λ‹΅λ³€ν•΄ μ£Όμ„Έμš”"},
    {"role": "user", "content": "κ³΅λ³΅ν˜ˆλ‹Ήμ΄ 120이상인 경우 제1ν˜• 당뇨와 제2ν˜• 당뇨 ν™˜μžλŠ” 각각 μ–΄λ–»κ²Œ 치료λ₯Ό λ°›μ•„μ•Ό ν•˜λ‚˜μš”?"},
]

outputs = pipeline(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
```

Note: You can also find detailed recipes on how to use the model locally, with `torch.compile()`, assisted generations, quantised and more at [`huggingface-llama-recipes`](https://github.com/huggingface/huggingface-llama-recipes)


## Limitations and Bias
- This model may produce biased or inaccurate results. It should not be solely relied upon for critical healthcare decisions.
- The model's knowledge is limited to its training data and cut-off date.
- It may exhibit biases present in the training data.
- The model may occasionally produce incorrect or inconsistent information.
  
- λͺ¨λΈμ˜ 지식은 ν›ˆλ ¨ 데이터와 마감일둜 μ œν•œλ©λ‹ˆλ‹€.
- ν›ˆλ ¨ 데이터에 편ν–₯이 μžˆμ„ 수 μžˆμŠ΅λ‹ˆλ‹€.
- λͺ¨λΈμ€ 가끔 잘λͺ»λ˜κ±°λ‚˜ μΌκ΄€λ˜μ§€ μ•Šμ€ 정보λ₯Ό 생성할 수 μžˆμŠ΅λ‹ˆλ‹€.
- 이 λͺ¨λΈμ€ 편ν–₯λ˜κ±°λ‚˜ λΆ€μ •ν™•ν•œ κ²°κ³Όλ₯Ό 생성할 수 μžˆμŠ΅λ‹ˆλ‹€. μ€‘μš”ν•œ 의료 결정에 이 λͺ¨λΈμ—λ§Œ μ˜μ‘΄ν•΄μ„œλŠ” μ•ˆ λ©λ‹ˆλ‹€.

## Legal Disclaimer
The model developers and distributors bear no legal responsibility for any consequences arising from the use of this model. <br>
This includes any direct, indirect, incidental, special, punitive, or consequential damages resulting from the model's output.<br>
By using this model, users assume all risks that may arise, and the responsibility for verifying and appropriately using the model's output lies solely with the user.<br>
This model cannot substitute for medical advice, diagnosis, or treatment, and qualified healthcare professionals should always be consulted for medical decisions.<br>
This disclaimer applies to the maximum extent permitted by applicable law.


## 법적 μ±…μž„ λ©΄μ±… μ‘°ν•­
λ³Έ λͺ¨λΈμ˜ μ‚¬μš©μœΌλ‘œ 인해 λ°œμƒν•˜λŠ” λͺ¨λ“  결과에 λŒ€ν•΄ λͺ¨λΈ 개발자 및 λ°°ν¬μžλŠ” μ–΄λ– ν•œ 법적 μ±…μž„λ„ 지지 μ•ŠμŠ΅λ‹ˆλ‹€. <br>
μ΄λŠ” λͺ¨λΈμ˜ 좜λ ₯으둜 μΈν•œ 직접적, 간접적, 우발적, νŠΉμˆ˜ν•œ, μ§•λ²Œμ  λ˜λŠ” 결과적 손해λ₯Ό ν¬ν•¨ν•©λ‹ˆλ‹€.<br>
μ‚¬μš©μžλŠ” λ³Έ λͺ¨λΈμ„ μ‚¬μš©ν•¨μœΌλ‘œμ¨ λ°œμƒν•  수 μžˆλŠ” λͺ¨λ“  μœ„ν—˜μ„ κ°μˆ˜ν•˜λ©°, λͺ¨λΈμ˜ 좜λ ₯에 λŒ€ν•œ 검증 및 μ μ ˆν•œ μ‚¬μš©μ— λŒ€ν•œ μ±…μž„μ€ μ „μ μœΌλ‘œ μ‚¬μš©μžμ—κ²Œ μžˆμŠ΅λ‹ˆλ‹€.<br>
λ³Έ λͺ¨λΈμ€ μ˜ν•™μ  μ‘°μ–Έ, 진단, λ˜λŠ” 치료λ₯Ό λŒ€μ²΄ν•  수 μ—†μœΌλ©°, 의료 κ΄€λ ¨ 결정을 내릴 λ•ŒλŠ” λ°˜λ“œμ‹œ μžκ²©μ„ κ°–μΆ˜ 의료 전문가와 상담해야 ν•©λ‹ˆλ‹€.<br>
이 λ©΄μ±… 쑰항은 κ΄€λ ¨ 법λ₯ μ΄ ν—ˆμš©ν•˜λŠ” μ΅œλŒ€ λ²”μœ„ λ‚΄μ—μ„œ μ μš©λ©λ‹ˆλ‹€.

## Model Card Contact
μœ μ„ ([email protected]), 김진싀([email protected])

## Additional Information
For more details about the base model, please refer to the original LLaMA 3.1 documentation.