metadata

base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
license: mit
pipeline_tag: question-answering
tags:
  - medical
  - chain-of-thought
  - fine-tuning
  - qlora
  - unsloth

DeepSeek-R1-Medical-CoT

🚀 Fine-tuning DeepSeek R1 for Medical Chain-of-Thought Reasoning

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B, specifically designed to enhance medical reasoning through Chain-of-Thought (CoT) prompting. It is trained using QLoRA with Unsloth optimization, allowing efficient fine-tuning on limited hardware resources.

📌 Model Details

Model Description

Developed by: [Your Name or Organization]
Fine-tuned from: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
Language(s): English, with a focus on medical terminology
Training Data: Medical reasoning dataset (medical-o1-reasoning-SFT)
Fine-tuning Method: QLoRA (4-bit adapters), later merged into 16-bit weights
Optimization: Unsloth (2x faster fine-tuning with lower memory usage)

Model Sources

Repository: [Your Hugging Face Model Repo URL]
Paper (if applicable): [Link]
Demo (if applicable): [Link]

🛠 How to Use the Model

1️⃣ Load the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_name = "your-huggingface-username/DeepSeek-R1-Medical-CoT"

tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForCausalLM.from_pretrained(repo_name)

model.eval()

2️⃣ Run inference

prompt = "What are the early symptoms of diabetes?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=200)

response = tokenizer.decode(output[0], skip_special_tokens=True)
print("Model Response:", response)

📢 Acknowledgments

DeepSeek-AI for releasing DeepSeek-R1
Unsloth for optimized LoRA fine-tuning
Hugging Face for hosting the models