--- base_model: google/gemma-3-270m-it tags: - ellora - dpo - icm - comprehensive-lora - general-enhancement - label-free - peft - gemma library_name: peft license: apache-2.0 datasets: - codelion/gemma-3-270m-icm-dpo pipeline_tag: text-generation model_type: gemma --- # General Model Enhancement via ICM-DPO with Comprehensive LoRA ## 🚀 Overview This model demonstrates comprehensive capability enhancement using ICM-generated preferences and high-capacity LoRA training via Direct Preference Optimization (DPO). This is **Recipe #6** from the [Ellora project](https://github.com/codelion/ellora) - a collection of standardized recipes for enhancing LLM capabilities. **Note**: This adapter includes full embedding and language modeling head layers (`embed_tokens` and `lm_head`), making it ~669MB rather than a typical lightweight LoRA adapter. This provides better performance at the cost of size. ## 🔧 Key Features - **🎯 Comprehensive LoRA**: Targets all major linear layers with rank 32 for maximum capacity enhancement - **📊 ICM-Generated Preferences**: Uses [Internal Coherence Maximization](https://github.com/codelion/icm) for completely label-free preference data generation - **⚡ DPO Training**: Direct preference optimization without requiring a separate reward model - **🌐 General Purpose**: Enhances capabilities across diverse tasks (reasoning, coding, creative writing, etc.) - **💾 Full Layer Integration**: Includes complete embedding and head layers for optimal performance ## 📊 Model Configuration - **Base Model**: `google/gemma-3-270m-it` - **LoRA Rank**: 32 - **LoRA Alpha**: 64 - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - **Modules to Save**: embed_tokens, lm_head (full layers) - **Training Method**: Direct Preference Optimization (DPO) - **Beta (KL Penalty)**: 0.05 - **Adapter Size**: ~669MB (includes full embedding/head layers) - **Trainable Parameters**: ~56.13838755173775% of base model ## 📈 Training Details ### Dataset - **Source**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo) - **Method**: ICM (Internal Coherence Maximization) for label-free preference generation - **Training Samples**: 1060 - **Evaluation Samples**: 50 ### Training Configuration - **Epochs**: 2 - **Batch Size**: 4 (per device) - **Gradient Accumulation**: 2 steps - **Effective Batch Size**: 8 - **Learning Rate**: 2e-06 - **Optimizer**: paged_adamw_8bit - **Memory Optimization**: BF16, Gradient Checkpointing ## 🔧 Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model and tokenizer base_model = AutoModelForCausalLM.from_pretrained( "google/gemma-3-270m-it", torch_dtype=torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m-it") # Load the enhanced model model = PeftModel.from_pretrained(base_model, "codelion/gemma-3-270m-icm-dpo-lora") # Generate enhanced responses prompt = "Explain quantum computing in simple terms." inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## 🎯 Capabilities Enhanced This model shows improvements across multiple domains: - **🧠 Reasoning**: Logical thinking, mathematical problem solving - **✍️ Creative Writing**: Story generation, poetry, descriptive text - **💻 Code Generation**: Python, JavaScript, SQL code creation - **❓ Question Answering**: Factual responses, explanations - **🔧 Problem Solving**: Step-by-step solutions, systematic thinking - **📋 Instruction Following**: Adherence to specific formatting and requirements ## 🔬 Methodology: ICM + DPO ### ICM (Internal Coherence Maximization) [ICM](https://github.com/codelion/icm) generates preference pairs without human annotation by: 1. Creating diverse prompts across multiple domains 2. Generating multiple responses per prompt 3. Using systematic evaluation to rank responses 4. Creating (prompt, chosen, rejected) preference pairs ### DPO (Direct Preference Optimization) DPO directly optimizes the model to: 1. Increase probability of chosen responses 2. Decrease probability of rejected responses 3. Maintain similarity to reference model (KL constraint) 4. Learn preferences without reward model training ## 📊 Expected Benefits - ✅ **Enhanced Quality**: Better responses across all task types - ✅ **Label-Free Training**: No manual preference annotation required - ✅ **Comprehensive Coverage**: All major model components enhanced - ✅ **Full Integration**: Complete embedding and head layer optimization - ✅ **Reproducible**: Standardized recipe from Ellora project ## 💡 When to Use This Adapter vs Merged Model **Use this adapter when:** - ✅ You want to combine with other adapters - ✅ You need the flexibility of PEFT loading/unloading - ✅ You want to fine-tune further on top of this enhancement **Use the merged model when:** - ✅ You want maximum simplicity (no PEFT dependencies) - ✅ You need a standalone model for deployment - ✅ You want slightly smaller size (~540MB vs ~669MB) ## 🏷️ Related Resources - **📚 Ellora Project**: [github.com/codelion/ellora](https://github.com/codelion/ellora) - **🔄 ICM Repository**: [github.com/codelion/icm](https://github.com/codelion/icm) - **📦 Merged Model**: [codelion/gemma-3-270m-icm](https://huggingface.co/codelion/gemma-3-270m-icm) - **📊 Training Dataset**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo) - **🤖 Base Model**: [google/gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it) - **📄 DPO Paper**: [Direct Preference Optimization](https://arxiv.org/abs/2305.18290) ## 💡 Innovation Summary This recipe demonstrates how to enhance model capabilities comprehensively without any manual labeling: 1. **🎯 ICM generates diverse, high-quality preference pairs automatically** 2. **⚡ DPO optimizes preferences directly without reward model complexity** 3. **🔧 Comprehensive LoRA maximizes enhancement while maintaining efficiency** 4. **🌐 Multi-domain training improves general capabilities, not just specific tasks** --- *This adapter is part of the [Ellora project](https://github.com/codelion/ellora) - standardized recipes for enhancing LLM capabilities. Recipe #6 demonstrates label-free general enhancement via ICM + DPO.*