---
base_model: google/gemma-3-270m-it
tags:
- ellora
- dpo
- icm
- comprehensive-lora
- general-enhancement
- label-free
- peft
- gemma
library_name: peft
license: apache-2.0
datasets:
- codelion/gemma-3-270m-icm-dpo
pipeline_tag: text-generation
model_type: gemma
---

# General Model Enhancement via ICM-DPO with Comprehensive LoRA

## 🚀 Overview

This model demonstrates comprehensive capability enhancement using ICM-generated preferences and high-capacity LoRA training via Direct Preference Optimization (DPO). This is **Recipe #6** from the [Ellora project](https://github.com/codelion/ellora) - a collection of standardized recipes for enhancing LLM capabilities.

**Note**: This adapter includes full embedding and language modeling head layers (`embed_tokens` and `lm_head`), making it ~669MB rather than a typical lightweight LoRA adapter. This provides better performance at the cost of size.

## 🔧 Key Features

- **🎯 Comprehensive LoRA**: Targets all major linear layers with rank 32 for maximum capacity enhancement
- **📊 ICM-Generated Preferences**: Uses [Internal Coherence Maximization](https://github.com/codelion/icm) for completely label-free preference data generation
- **⚡ DPO Training**: Direct preference optimization without requiring a separate reward model
- **🌐 General Purpose**: Enhances capabilities across diverse tasks (reasoning, coding, creative writing, etc.)
- **💾 Full Layer Integration**: Includes complete embedding and head layers for optimal performance

## 📊 Model Configuration

- **Base Model**: `google/gemma-3-270m-it`
- **LoRA Rank**: 32
- **LoRA Alpha**: 64
- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Modules to Save**: embed_tokens, lm_head (full layers)
- **Training Method**: Direct Preference Optimization (DPO)
- **Beta (KL Penalty)**: 0.05
- **Adapter Size**: ~669MB (includes full embedding/head layers)
- **Trainable Parameters**: ~56.13838755173775% of base model

## 📈 Training Details

### Dataset
- **Source**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
- **Method**: ICM (Internal Coherence Maximization) for label-free preference generation  
- **Training Samples**: 1060
- **Evaluation Samples**: 50

### Training Configuration
- **Epochs**: 2
- **Batch Size**: 4 (per device)
- **Gradient Accumulation**: 2 steps
- **Effective Batch Size**: 8
- **Learning Rate**: 2e-06
- **Optimizer**: paged_adamw_8bit
- **Memory Optimization**: BF16, Gradient Checkpointing

## 🔧 Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-3-270m-it",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m-it")

# Load the enhanced model
model = PeftModel.from_pretrained(base_model, "codelion/gemma-3-270m-icm-dpo-lora")

# Generate enhanced responses
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## 🎯 Capabilities Enhanced

This model shows improvements across multiple domains:

- **🧠 Reasoning**: Logical thinking, mathematical problem solving
- **✍️ Creative Writing**: Story generation, poetry, descriptive text
- **💻 Code Generation**: Python, JavaScript, SQL code creation
- **❓ Question Answering**: Factual responses, explanations
- **🔧 Problem Solving**: Step-by-step solutions, systematic thinking
- **📋 Instruction Following**: Adherence to specific formatting and requirements

## 🔬 Methodology: ICM + DPO

### ICM (Internal Coherence Maximization)
[ICM](https://github.com/codelion/icm) generates preference pairs without human annotation by:
1. Creating diverse prompts across multiple domains
2. Generating multiple responses per prompt
3. Using systematic evaluation to rank responses
4. Creating (prompt, chosen, rejected) preference pairs

### DPO (Direct Preference Optimization)
DPO directly optimizes the model to:
1. Increase probability of chosen responses
2. Decrease probability of rejected responses  
3. Maintain similarity to reference model (KL constraint)
4. Learn preferences without reward model training

## 📊 Expected Benefits

- ✅ **Enhanced Quality**: Better responses across all task types
- ✅ **Label-Free Training**: No manual preference annotation required
- ✅ **Comprehensive Coverage**: All major model components enhanced
- ✅ **Full Integration**: Complete embedding and head layer optimization
- ✅ **Reproducible**: Standardized recipe from Ellora project

## 💡 When to Use This Adapter vs Merged Model

**Use this adapter when:**
- ✅ You want to combine with other adapters
- ✅ You need the flexibility of PEFT loading/unloading
- ✅ You want to fine-tune further on top of this enhancement

**Use the merged model when:**
- ✅ You want maximum simplicity (no PEFT dependencies)
- ✅ You need a standalone model for deployment
- ✅ You want slightly smaller size (~540MB vs ~669MB)

## 🏷️ Related Resources

- **📚 Ellora Project**: [github.com/codelion/ellora](https://github.com/codelion/ellora)
- **🔄 ICM Repository**: [github.com/codelion/icm](https://github.com/codelion/icm) 
- **📦 Merged Model**: [codelion/gemma-3-270m-icm](https://huggingface.co/codelion/gemma-3-270m-icm)
- **📊 Training Dataset**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
- **🤖 Base Model**: [google/gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it)
- **📄 DPO Paper**: [Direct Preference Optimization](https://arxiv.org/abs/2305.18290)

## 💡 Innovation Summary

This recipe demonstrates how to enhance model capabilities comprehensively without any manual labeling:

1. **🎯 ICM generates diverse, high-quality preference pairs automatically**
2. **⚡ DPO optimizes preferences directly without reward model complexity**  
3. **🔧 Comprehensive LoRA maximizes enhancement while maintaining efficiency**
4. **🌐 Multi-domain training improves general capabilities, not just specific tasks**

---

*This adapter is part of the [Ellora project](https://github.com/codelion/ellora) - standardized recipes for enhancing LLM capabilities. Recipe #6 demonstrates label-free general enhancement via ICM + DPO.*