zijiechen156's picture
Update README.md
56d4645 verified
metadata
base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
license: mit
pipeline_tag: question-answering
tags:
  - medical
  - chain-of-thought
  - fine-tuning
  - qlora
  - unsloth

DeepSeek-R1-Medical-CoT

🚀 Fine-tuning DeepSeek R1 for Medical Chain-of-Thought Reasoning

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B, specifically designed to enhance medical reasoning through Chain-of-Thought (CoT) prompting. It is trained using QLoRA with Unsloth optimization, allowing efficient fine-tuning on limited hardware resources.


📌 Model Details

Model Description

  • Developed by: [Your Name or Organization]
  • Fine-tuned from: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
  • Language(s): English, with a focus on medical terminology
  • Training Data: Medical reasoning dataset (medical-o1-reasoning-SFT)
  • Fine-tuning Method: QLoRA (4-bit adapters), later merged into 16-bit weights
  • Optimization: Unsloth (2x faster fine-tuning with lower memory usage)

Model Sources

  • Repository: [Your Hugging Face Model Repo URL]
  • Paper (if applicable): [Link]
  • Demo (if applicable): [Link]

🛠 How to Use the Model

1️⃣ Load the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

repo_name = "your-huggingface-username/DeepSeek-R1-Medical-CoT"

tokenizer = AutoTokenizer.from_pretrained(repo_name)
model = AutoModelForCausalLM.from_pretrained(repo_name)

model.eval()

2️⃣ Run inference

prompt = "What are the early symptoms of diabetes?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=200)

response = tokenizer.decode(output[0], skip_special_tokens=True)
print("Model Response:", response)

📢 Acknowledgments

  • DeepSeek-AI for releasing DeepSeek-R1
  • Unsloth for optimized LoRA fine-tuning
  • Hugging Face for hosting the models