license: apache-2.0
language:
  - en
tags:
  - medical
  - llama
  - finetuned
  - health
model_type: llama
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Model Card for HealthGPT-TinyLlama
This model is a fine-tuned version of TinyLlama-1.1B-Chat-v1.0 on a custom medical dataset. It was developed to serve as a lightweight, domain-specific assistant capable of answering medical questions fluently and coherently.
Model Details
Model Description
HealthGPT-TinyLlama is a 1.1B parameter model fine-tuned using LoRA adapters for the task of medical question answering. The base model is TinyLlama, a compact transformer architecture optimized for performance and efficiency.
- Developed by: Selina Zarzour
 - Shared by: selinazarzour
 - Model type: Causal Language Model
 - Language(s): English
 - License: apache-2.0
 - Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
 
Model Sources
- Repository: https://huggingface.co/selinazarzour/healthgpt-tinyllama
 - Demo (local only): Gradio app tested locally with GPU (not deployed to Spaces due to lack of CPU compatibility)
 
Uses
Direct Use
- Designed to answer general medical questions.
 - Intended for educational and experimental use.
 
Out-of-Scope Use
- Not suitable for clinical decision-making or professional diagnosis.
 - Should not be relied on for life-critical use cases.
 
Bias, Risks, and Limitations
- The model may hallucinate or provide medically inaccurate information.
 - It has not been validated against real-world clinical data.
 - Biases present in the training dataset may persist.
 
Recommendations
- Always verify model outputs with qualified professionals.
 - Do not use in scenarios where safety is critical.
 
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("selinazarzour/healthgpt-tinyllama")
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
prompt = "### Question:\nWhat are the symptoms of diabetes?\n\n### Answer:\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150)
print(tokenizer.decode(outputs[0]))
Training Details
Training Data
- Finetuned on a synthetic dataset composed of medical questions and answers derived from reliable medical knowledge sources.
 
Training Procedure
- LoRA adapter training using HuggingFace PEFT and 
transformers - Model merged with base weights after training
 
Training Hyperparameters
- Precision: float16 mixed precision
 - Epochs: 3
 - Optimizer: AdamW
 - Batch size: 4
 
Evaluation
Testing Data, Factors & Metrics
- Testing done manually by querying the model with unseen questions.
 - Sample outputs evaluated for relevance, grammar, and factual accuracy.
 
Results
- The model produces relevant and coherent answers in most cases.
 - Model performs best on short, fact-based questions.
 
Model Examination
Screenshot of local Gradio app interface:
Note: The model was not deployed publicly due to GPU-only compatibility, but it runs successfully in local environments with GPU access.
Environmental Impact
- Hardware Type: Google Colab GPU (T4/A100)
 - Hours used: ~3 hours
 - Cloud Provider: Google Cloud via Colab
 - Compute Region: US (unknown exact zone)
 - Carbon Emitted: Unknown
 
Technical Specifications
Model Architecture and Objective
- LlamaForCausalLM with 22 layers, 32 attention heads, 2048 hidden size
 - LoRA finetuning applied to attention layers only
 
Compute Infrastructure
Hardware: Colab GPU
Software:
- transformers 4.39+
 - peft
 - bitsandbytes (for initial quantized training)
 
Citation
APA: Zarzour, S. (2025). HealthGPT-TinyLlama: A fine-tuned 1.1B LLM for medical Q&A.
Model Card Contact
- Contact: Selina Zarzour via Hugging Face (@selinazarzour)
 
Note: This model is a prototype and not intended for clinical use.
