metadata
language:
- en
metrics:
- bertscore
- bleu
- rouge
base_model:
- meta-llama/Llama-3.2-3B-Instruct
library_name: transformers
tags:
- text-generation-inference
Model Card
Overview
This repository contains a LoRA-fine-tuned version of Meta's Llama-3.2-3B-Instruct model, trained using PEFT (LoRA) on a custom bank customer-service FAQ dataset for question-answering. The adapter weights, configuration, and tokenizer files are included for seamless inference via a single from_pretrained
call.
Model Details
Model Description
Base model:
meta-llama/Llama-3.2-3B-Instruct
Method: PEFT (LoRA)
LoRA configuration:
- Rank (r): 8
- Alpha: 32
- Dropout: 0.05
- Target modules:
q_proj
,v_proj
Task: Customer-service question answering on banking FAQs
Metadata
- Developed by: Sardar Taimoor
- Finetuned by: SardarTaimoor (https://huggingface.co/SardarTaimoor)
- Model type: Causal Language Model
- Language(s): English
- License: MIT
- Finetuned from:
meta-llama/Llama-3.2-3B-Instruct
Links
- Model repo: https://huggingface.co/SardarTaimoor/llama3b-lora
- Base model: https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct
- Training notebook: Fine-Tuning.ipynb
How to Use this Model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "SardarTaimoor/llama3b-lora"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto", # splits layers across GPU/CPU
torch_dtype=torch.float16, # half-precision on GPU
low_cpu_mem_usage=True # avoids fully materializing everything in host RAM
)
inputs = tokenizer("What's the Little Champs account?", return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=50, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))
print("-" * 60)
Training Details
Data
- Dataset description: Custom customer-service FAQ dataset for bank products, formatted as JSONL with user prompts and assistant completions.
- Number of examples: 319 total (train: ~303, validation: ~16 after a 5% split)
- Preprocessing steps: Prompts and completions extracted and cleaned from JSONL; tokenization via the original Llama tokenizer.
Procedure
- Compute environment: Google Colab T4 GPU, Python 3
- Epochs: 20
- Batch size: 4 per device (gradient accumulation steps = 8)
- Learning rate: 2e-5
- Precision: fp16
Evaluation & Metrics
Evaluation dataset: 5% holdout from the custom FAQ dataset (~16 examples)
Metrics: BLEU, ROUGE, BERTScore
Results:
- BLEU: 0.0146
- ROUGE: rouge1=0.1083, rouge2=0.0281, rougeL=0.0816
- BERTScore (mean f1): 0.8211
Limitations & Biases
- Known limitations: May hallucinate rare banking details; domain-restricted to the provided FAQ data.
- Potential biases: Reflects biases present in original Llama and the customer-service samples.
License
This model is released under the MIT license. See LICENSE for details.
For questions or contributions, please open an issue on the model repo.