Model Card: UAB-NLP/ProjGen_Finetuned_llama

A LLaMA-3.2 based instruction-tuned model fine-tuned with Unsloth + QLoRA using 🤗 Transformers.
This model is part of the ProjGen project, aimed at enhancing developer productivity through automated project generation and structured code scaffolding.

Model Details

Model Description

Base model: meta-llama/Llama-3.2-<SIZE>-Instruct
Finetuning method: Unsloth + QLoRA (LoRA adapters)
Precision (train): 4-bit NF4 quantization (bitsandbytes) + bf16 compute
Context length: 4096
Task(s): Instruction following & project/code generation
License: Inherits from Meta’s LLaMA-3.2 license
Developed by: UAB-NLP Group (Sai Praneeth Kumar, University of Alabama at Birmingham)
Finetuned from: meta-llama/Llama-3.2-<SIZE>-Instruct
Shared by: UAB-NLP

Model Sources

Repository: UAB-NLP/ProjGen_Finetuned_llama
Project Paper: ProjGen – Enhanced Developer Productivity for Flask Project Generation with a RAG-Enhanced Fine-Tuned Local LLM

Intended Uses & Limitations

Direct Use

Generating Flask/Django/Streamlit project structures automatically.
Instruction-following tasks related to software engineering and code generation.

Downstream Use

Further fine-tuning on domain-specific datasets (e.g., medical imaging, finance, etc.).
Integration into developer assistants and productivity tools.

Out-of-Scope / Limitations

Not suitable for medical, legal, or financial decision-making without human review.
May hallucinate or produce insecure/inefficient code if not monitored.

Bias, Risks, and Limitations

The model inherits risks from the base LLaMA-3.2 model:

Possible hallucinations and factual inaccuracies.
Dataset/domain biases reflected in responses.
Outputs should be validated before production deployment.

Recommendation: Always pair outputs with testing, validation, and human oversight.

Getting Started

Inference (PEFT adapter form)

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "UAB-NLP/ProjGen_Finetuned_llama"

tok = AutoTokenizer.from_pretrained(model_id)

bnb = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb,
    device_map="auto",
    torch_dtype="auto"
)

prompt = "Generate a Flask project with login, dashboard, and reports."
inputs = tok(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tok.decode(outputs[0], skip_special_tokens=True))

Training Details

Data

Dataset: Custom ProjGen dataset built from structured Flask/Django/Streamlit projects and instructions.
Size: [Fill in #samples / tokens]
Preprocessing: Chat-style instruction formatting (system/user/assistant), deduplication, truncation at 4096 tokens.

Training Procedure

Quantization: 4-bit NF4 + double quantization (bitsandbytes)
LoRA Config:
- r: 16
- alpha: 32
- dropout: 0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_up_proj, down_proj
Optimizer: Paged AdamW (32-bit)
LR / Schedule: 2e-4 with cosine decay + warmup
Batch size: [fill in effective batch size]
Epochs/Steps: 60 steps
Precision: bf16 mixed precision
Grad checkpointing: Enabled
Flash attention: Enabled (Unsloth optimization)

Training Hardware

GPU: RTX 4070 (12GB VRAM) [replace with actual if different]
Training runtime: 331.47 seconds
Steps per second: 0.181
Samples per second: 1.448
Final training loss: 0.4899
Total FLOPs: 3.8674e15 FLOPs
Checkpoint size: ~ (adapter size: ~200MB; merged model size depends on base LLaMA size)

Evaluation

Training Loss Curve

Training Loss Table (per step)

Step	Training Loss
1	0.9276
2	1.0030
3	1.0463
4	0.9592
5	0.9903
6	0.9239
7	0.7762
8	0.6905
9	0.6130
10	0.5687
11	0.6292
12	0.5927
13	0.5880
14	0.5021
15	0.5303
16	0.4216
17	0.4692
18	0.5330
19	0.4350
20	0.4003
21	0.3515
22	0.4201
23	0.4200
24	0.3666
25	0.4260
26	0.4261
27	0.3206
28	0.4385
29	0.3475
30	0.4438
31	0.4648
32	0.4088
33	0.4422
34	0.4209
35	0.3593
36	0.3433
37	0.3874
38	0.3604
39	0.4374
40	0.4048
41	0.3604
42	0.4087
43	0.3240
44	0.4375
45	0.4195
46	0.3881
47	0.4383
48	0.3506
49	0.4687
50	0.3709
51	0.3951
52	0.4012
53	0.4020
54	0.3977
55	0.2816
56	0.4136
57	0.4400
58	0.3268
59	0.4218
60	0.3629

Final averaged train loss: 0.4899

Environmental Impact (estimate)

Hardware: RTX 4070 (12GB VRAM) [replace with actual]
Hours: ~0.09 h (331 seconds)
Region/Provider: [cloud/on-prem]
Estimated CO₂e: Use ML CO₂ Impact

Citation

If you use this model, please cite the base model and this project:

BibTeX (base, example):

@article{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and others},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2023}
}

Your work (fill in):

@misc{projgen2025,
  title = {ProjGen: Enhanced Developer Productivity for Flask Project Generation with a RAG-Enhanced Fine-Tuned Local LLM},
  author = {Sai Praneeth, Renduchinthala and UAB-NLP Group},
  year = {2025},
  howpublished = {\url{https://huggingface.co/UAB-NLP/ProjGen_Finetuned_llama}}
}

Contact

Author: Sai Praneeth Kumar (UAB, UAB-NLP Group)
HF Profile: UAB-NLP