base_model: microsoft/Phi-3-mini-4k-instruct library_name: peft

Model Card for LoRA Fine-Tuned Phi-3-mini-4k

This model is a LoRA (Low-Rank Adaptation) fine-tuned version of the microsoft/Phi-3-mini-4k-instruct model, using 4-bit quantization for efficient training and inference. The model is trained and tested for instruction-following tasks and can be easily saved and shared via the Hugging Face Hub.

Model Details

Model Description

This model adapts the Phi-3-mini-4k-instruct LLM using LoRA, a parameter-efficient fine-tuning technique, and 4-bit quantization for reduced memory usage. It is suitable for a variety of NLP tasks, especially where resource efficiency is important.

Developed by: Amit Chaubey
Funded by : N/A
Shared by : Amit Chaubey
Model type: Causal Language Model (LLM), LoRA fine-tuned, 4-bit quantized
Language(s) (NLP): English
License: MIT (or the license of the base model, if different)
Finetuned from model : microsoft/Phi-3-mini-4k-instruct

Model Sources

Repository: https://github.com/amit-chaubey/finetune-LoRA-4bit
Paper : https://arxiv.org/abs/2106.09685 (LoRA)
Demo [optional: [More Information Needed]

Uses

Direct Use

Text generation
Instruction following
Conversational AI
Educational and research purposes

Out-of-Scope Use

Not suitable for real-time safety-critical applications
Not intended for generating harmful, biased, or misleading content

Bias, Risks, and Limitations

The model may reflect biases present in the training data.
Not suitable for sensitive or high-stakes decision-making.
Outputs should be reviewed by humans before use in production.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Always validate outputs for your use case.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = "microsoft/Phi-3-mini-4k-instruct"
lora_adapter = "path/to/lora_adapter"

model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, lora_adapter)
tokenizer = AutoTokenizer.from_pretrained(model)

prompt = "The Weather is good today"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Details

Training Data

Used the Hugging Face sweatSmile/sarcastic-dataset dataset for demonstration. Replace with your own dataset as needed.

Training Procedure

LoRA rank: 8
LoRA alpha: 16
LoRA dropout: 0.05
Target modules: ["query_key_value", "o_proj", "qkv_proj", "gate_up_proj", "down_proj"]
4-bit quantization (nf4)
Training regime: bf16 mixed precision

Speeds, Sizes, Times [optional]

Model size: ~3.8B parameters (0.33% trainable)
Training time: Varies by hardware (e.g., 30min to 1 hour on A100 GPU for small datasets(1000))

Evaluation

Testing Data, Factors & Metrics

Used a held-out portion of the training dataset for evaluation.
Metrics: Perplexity, qualitative review of generated outputs.

Results

The model demonstrates strong instruction-following and text generation capabilities after LoRA fine-tuning.

Environmental Impact

Hardware Type: Google L4
Hours used: ~20 min
Cloud Provider: Google
Compute Region: UK
Carbon Emitted: Estimate using ML CO2 Impact calculator

Technical Specifications

Model Architecture and Objective

Base: Phi-3-mini-4k-instruct (Causal LM)
LoRA fine-tuning with PEFT
4-bit quantization for efficiency

Compute Infrastructure

Hardware: NVIDIA A100 GPU (or similar - L4)
Software: Python 3.10+, PyTorch, Transformers, PEFT, Datasets

Citation [optional]

BibTeX:

@article{hu2021lora,
  title={LoRA: Low-Rank Adaptation of Large Language Models},
  author={Hu, Edward J. and others},
  journal={arXiv preprint arXiv:2106.09685},
  year={2021}
}

Model Card Authors

Amit Chaubey

Model Card Contact

sweatSmile

Framework versions

PEFT 0.15.2

sweatSmile
/

ak-phi3-mini-sarcasm-adapter