codelion
/

gemma-3-270m-icm-dpo-lora

@@ -23,13 +23,15 @@ model_type: gemma
 This model demonstrates comprehensive capability enhancement using ICM-generated preferences and high-capacity LoRA training via Direct Preference Optimization (DPO). This is **Recipe #6** from the [Ellora project](https://github.com/codelion/ellora) - a collection of standardized recipes for enhancing LLM capabilities.
 ## 🔧 Key Features
 - **🎯 Comprehensive LoRA**: Targets all major linear layers with rank 32 for maximum capacity enhancement
 - **📊 ICM-Generated Preferences**: Uses [Internal Coherence Maximization](https://github.com/codelion/icm) for completely label-free preference data generation
 - **⚡ DPO Training**: Direct preference optimization without requiring a separate reward model
 - **🌐 General Purpose**: Enhances capabilities across diverse tasks (reasoning, coding, creative writing, etc.)
-- **💾 Memory Efficient**: Uses gradient checkpointing and 8-bit optimizer for efficient training
 ## 📊 Model Configuration
@@ -37,8 +39,10 @@ This model demonstrates comprehensive capability enhancement using ICM-generated
 - **LoRA Rank**: 32
 - **LoRA Alpha**: 64
 - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
 - **Training Method**: Direct Preference Optimization (DPO)
 - **Beta (KL Penalty)**: 0.5
 - **Trainable Parameters**: ~56.13838755173775% of base model
 ## 📈 Training Details
@@ -46,7 +50,7 @@ This model demonstrates comprehensive capability enhancement using ICM-generated
 ### Dataset
 - **Source**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
 - **Method**: ICM (Internal Coherence Maximization) for label-free preference generation
-- **Training Samples**: 46044
 - **Evaluation Samples**: 50
 ### Training Configuration
@@ -116,13 +120,26 @@ DPO directly optimizes the model to:
 - ✅ **Enhanced Quality**: Better responses across all task types
 - ✅ **Label-Free Training**: No manual preference annotation required
 - ✅ **Comprehensive Coverage**: All major model components enhanced
-- ✅ **Memory Efficient**: ~56.13838755173775% trainable parameters vs full fine-tuning
 - ✅ **Reproducible**: Standardized recipe from Ellora project
 ## 🏷️ Related Resources
 - **📚 Ellora Project**: [github.com/codelion/ellora](https://github.com/codelion/ellora)
 - **🔄 ICM Repository**: [github.com/codelion/icm](https://github.com/codelion/icm)
 - **📊 Training Dataset**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
 - **🤖 Base Model**: [google/gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it)
 - **📄 DPO Paper**: [Direct Preference Optimization](https://arxiv.org/abs/2305.18290)

 This model demonstrates comprehensive capability enhancement using ICM-generated preferences and high-capacity LoRA training via Direct Preference Optimization (DPO). This is **Recipe #6** from the [Ellora project](https://github.com/codelion/ellora) - a collection of standardized recipes for enhancing LLM capabilities.
+**Note**: This adapter includes full embedding and language modeling head layers (`embed_tokens` and `lm_head`), making it ~669MB rather than a typical lightweight LoRA adapter. This provides better performance at the cost of size.
 ## 🔧 Key Features
 - **🎯 Comprehensive LoRA**: Targets all major linear layers with rank 32 for maximum capacity enhancement
 - **📊 ICM-Generated Preferences**: Uses [Internal Coherence Maximization](https://github.com/codelion/icm) for completely label-free preference data generation
 - **⚡ DPO Training**: Direct preference optimization without requiring a separate reward model
 - **🌐 General Purpose**: Enhances capabilities across diverse tasks (reasoning, coding, creative writing, etc.)
+- **💾 Full Layer Integration**: Includes complete embedding and head layers for optimal performance
 ## 📊 Model Configuration
 - **LoRA Rank**: 32
 - **LoRA Alpha**: 64
 - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+- **Modules to Save**: embed_tokens, lm_head (full layers)
 - **Training Method**: Direct Preference Optimization (DPO)
 - **Beta (KL Penalty)**: 0.5
+- **Adapter Size**: ~669MB (includes full embedding/head layers)
 - **Trainable Parameters**: ~56.13838755173775% of base model
 ## 📈 Training Details
 ### Dataset
 - **Source**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
 - **Method**: ICM (Internal Coherence Maximization) for label-free preference generation
+- **Training Samples**: 44286
 - **Evaluation Samples**: 50
 ### Training Configuration
 - ✅ **Enhanced Quality**: Better responses across all task types
 - ✅ **Label-Free Training**: No manual preference annotation required
 - ✅ **Comprehensive Coverage**: All major model components enhanced
+- ✅ **Full Integration**: Complete embedding and head layer optimization
 - ✅ **Reproducible**: Standardized recipe from Ellora project
+## 💡 When to Use This Adapter vs Merged Model
+**Use this adapter when:**
+- ✅ You want to combine with other adapters
+- ✅ You need the flexibility of PEFT loading/unloading
+- ✅ You want to fine-tune further on top of this enhancement
+**Use the merged model when:**
+- ✅ You want maximum simplicity (no PEFT dependencies)
+- ✅ You need a standalone model for deployment
+- ✅ You want slightly smaller size (~540MB vs ~669MB)
 ## 🏷️ Related Resources
 - **📚 Ellora Project**: [github.com/codelion/ellora](https://github.com/codelion/ellora)
 - **🔄 ICM Repository**: [github.com/codelion/icm](https://github.com/codelion/icm)
+- **📦 Merged Model**: [codelion/gemma-3-270m-icm](https://huggingface.co/codelion/gemma-3-270m-icm)
 - **📊 Training Dataset**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
 - **🤖 Base Model**: [google/gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it)
 - **📄 DPO Paper**: [Direct Preference Optimization](https://arxiv.org/abs/2305.18290)

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2887a89f9e09d88ba2a19f8474de58f1f06923e905147be675f95c8ab689844d
 size 701497992

 version https://git-lfs.github.com/spec/v1
+oid sha256:9fc6d5e641a8697f452f3a6bcdf9ca1948a2b89350190f029b674471b3d9649c
 size 701497992

results.json CHANGED Viewed

@@ -1,10 +1,10 @@
 {
   "training_metrics": {
-    "train_runtime": 19296.9327,
-    "train_samples_per_second": 7.158,
-    "train_steps_per_second": 0.447,
     "total_flos": 0.0,
-    "train_loss": 0.6712451833029547,
     "epoch": 3.0
   },
   "config": {

 {
   "training_metrics": {
+    "train_runtime": 22974.7335,
+    "train_samples_per_second": 5.783,
+    "train_steps_per_second": 0.361,
     "total_flos": 0.0,
+    "train_loss": 0.2695974061958248,
     "epoch": 3.0
   },
   "config": {