File size: 6,811 Bytes
af8fdff
 
 
 
 
 
 
 
0819224
af8fdff
 
0819224
af8fdff
 
 
 
 
 
 
0819224
 
 
af8fdff
 
 
 
0819224
af8fdff
0819224
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af8fdff
 
 
0819224
 
 
 
af8fdff
0819224
 
af8fdff
0819224
 
af8fdff
0819224
 
af8fdff
0819224
af8fdff
0819224
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af8fdff
 
 
 
0819224
af8fdff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0819224
af8fdff
 
 
 
 
 
 
 
46bdb14
af8fdff
0819224
af8fdff
0819224
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
---
library_name: transformers
tags: []
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
Finetuned "BioMistral/BioMistral-7B" with MedQA dataset. 

## Model Details
A Collection of Open-Source Pretrained Large Language Models for Medical Domains finetuned with MedQA dataset.

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** mychen76
- **Model type:** BioMedical
- **Finetuned from model:** BioMistral/BioMistral-7B

### Model Sources [optional]

<!-- Provide the basic links for the model. -->
- **dataset:** MedQA dataset

 
## How to Get Started with the Model

Use the code below to get started with the model.
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Load Model:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

base_model_id = "mychen76/biomistral_medqa_v1"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(base_model_id, quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained(
    base_model_id,
    add_eos_token=True,
    add_bos_token=True,
)

## Uses

```
*** Information ***
```
eval_prompt = """From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer:

### Question type:
information

### Question:
What are the genetic changes related to X-linked lymphoproliferative disease ?

### Answer:
"""

model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda")

ft_model.eval()
with torch.no_grad():
    print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=300)[0], skip_special_tokens=True))
```
result:
```
From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer:

### Question type:
information

### Question:
What are the genetic changes related to X-linked lymphoproliferative disease ?

### Answer:
X-linked lymphoproliferative disease (XLP) is a rare primary immunodeficiency syndrome. XLP is caused by mutations in SH2D1A gene, which encodes the cytoplasmic signaling protein SLAM-associated protein ( client protein-SLAM). SLAM is a member of the signaling lymphocytic activation molecule family of receptors, which are involved in the regulation of lymphocyte activation and proliferation. The SLAM receptor is expressed on the surface of B and T lymphocytes, natural killer cells, and monocytes. Mutations in SH2D1A gene lead to impaired signaling through the SLAM receptor, resulting in a deficiency in the activation and proliferation of B and T lymphocytes. This leads to a decrease in the number of B and T lymphocytes, resulting in a weakened immune response.
```

*** Frequency ***
```
eval_prompt = """From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer:

### Question type:
frequency

### Question:
How many people are affected by Smith-Lemli-Opitz syndrome ?

### Answer:
"""

model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda")

ft_model.eval()
with torch.no_grad():
    print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=300)[0], skip_special_tokens=True))
```
result:
```
From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer:

### Question type:
frequency

### Question:
How many people are affected by Smith-Lemli-Opitz syndrome ?

### Answer:
Smith-Lemli-Opitz syndrome (SLOS) is a rare autosomal recessive disorder of human development. It is characterized by a wide range of symptoms, including growth and developmental delay, intellectual disability, characteristic facial features, and congenital heart defects. The prevalence of SLOS is estimated to be 1 in 15,000 to 1 in 25,000 live births.
```

*** Symptons ***
```
eval_prompt = """From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer:

### Question type:
symptoms

### Question:
What are the symptoms of Norrie disease ?

### Answer:
"""

model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda")

ft_model.eval()
with torch.no_grad():
    print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=300)[0], skip_special_tokens=True))
```
Result:
```
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.

From the MedQuad MedicalQA Dataset: Given the following medical question and question type, provide an accurate answer:

### Question type:
symptoms

### Question:
What are the symptoms of Norrie disease ?

### Answer:
Norrie disease is a rare, X-linked recessive disorder of the blood vessels. It is characterized by a variety of symptoms, including glaucoma, mental retardation, seizures, and deafness.
```


### Out-of-Scope Use

images
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.


[More Information Needed]

## Training Details

### Training Data

- **dataset:** keivalya/MedQuad-MedicalQnADataset
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[More Information Needed]

### Training Procedure

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

## Citation

Arxiv : https://arxiv.org/abs/2402.10373

@misc{labrak2024biomistral,
      title={BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains}, 
      author={Yanis Labrak and Adrien Bazoge and Emmanuel Morin and Pierre-Antoine Gourraud and Mickael Rouvier and Richard Dufour},
      year={2024},
      eprint={2402.10373},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}