metadata
base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
license: mit
pipeline_tag: question-answering
tags:
- medical
- chain-of-thought
- fine-tuning
- qlora
- unsloth
DeepSeek-R1-Medical-CoT
🚀 Fine-tuning DeepSeek R1 for Medical Chain-of-Thought Reasoning
This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B
, specifically designed to enhance medical reasoning through Chain-of-Thought (CoT) prompting. It is trained using QLoRA with Unsloth optimization, allowing efficient fine-tuning on limited hardware resources.
📌 Model Details
Model Description
- Developed by: [Your Name or Organization]
- Fine-tuned from:
deepseek-ai/DeepSeek-R1-Distill-Llama-8B
- Language(s): English, with a focus on medical terminology
- Training Data: Medical reasoning dataset (
medical-o1-reasoning-SFT
) - Fine-tuning Method: QLoRA (4-bit adapters), later merged into 16-bit weights
- Optimization: Unsloth (2x faster fine-tuning with lower memory usage)
Model Sources
- Repository: [Your Hugging Face Model Repo URL]
- Paper (if applicable): [Link]
- Demo (if applicable): [Link]
🛠 How to Use the Model
1️⃣ Load the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_name = "your-huggingface-username/DeepSeek-R1-Medical-CoT"
tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForCausalLM.from_pretrained(repo_name)
model.eval()
2️⃣ Run inference
prompt = "What are the early symptoms of diabetes?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print("Model Response:", response)
📢 Acknowledgments
- DeepSeek-AI for releasing DeepSeek-R1
- Unsloth for optimized LoRA fine-tuning
- Hugging Face for hosting the models