General Model Enhancement via ICM-DPO with Comprehensive LoRA
π Overview
This model demonstrates comprehensive capability enhancement using ICM-generated preferences and high-capacity LoRA training via Direct Preference Optimization (DPO). This is Recipe #6 from the Ellora project - a collection of standardized recipes for enhancing LLM capabilities.
Note: This adapter includes full embedding and language modeling head layers (embed_tokens
and lm_head
), making it ~669MB rather than a typical lightweight LoRA adapter. This provides better performance at the cost of size.
π§ Key Features
- π― Comprehensive LoRA: Targets all major linear layers with rank 32 for maximum capacity enhancement
- π ICM-Generated Preferences: Uses Internal Coherence Maximization for completely label-free preference data generation
- β‘ DPO Training: Direct preference optimization without requiring a separate reward model
- π General Purpose: Enhances capabilities across diverse tasks (reasoning, coding, creative writing, etc.)
- πΎ Full Layer Integration: Includes complete embedding and head layers for optimal performance
π Model Configuration
- Base Model:
google/gemma-3-270m-it
- LoRA Rank: 32
- LoRA Alpha: 64
- Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Modules to Save: embed_tokens, lm_head (full layers)
- Training Method: Direct Preference Optimization (DPO)
- Beta (KL Penalty): 0.05
- Adapter Size: ~669MB (includes full embedding/head layers)
- Trainable Parameters: ~56.13838755173775% of base model
π Training Details
Dataset
- Source: codelion/gemma-3-270m-icm-dpo
- Method: ICM (Internal Coherence Maximization) for label-free preference generation
- Training Samples: 1060
- Evaluation Samples: 50
Training Configuration
- Epochs: 2
- Batch Size: 4 (per device)
- Gradient Accumulation: 2 steps
- Effective Batch Size: 8
- Learning Rate: 2e-06
- Optimizer: paged_adamw_8bit
- Memory Optimization: BF16, Gradient Checkpointing
π§ Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
"google/gemma-3-270m-it",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m-it")
# Load the enhanced model
model = PeftModel.from_pretrained(base_model, "codelion/gemma-3-270m-icm-dpo-lora")
# Generate enhanced responses
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
π― Capabilities Enhanced
This model shows improvements across multiple domains:
- π§ Reasoning: Logical thinking, mathematical problem solving
- βοΈ Creative Writing: Story generation, poetry, descriptive text
- π» Code Generation: Python, JavaScript, SQL code creation
- β Question Answering: Factual responses, explanations
- π§ Problem Solving: Step-by-step solutions, systematic thinking
- π Instruction Following: Adherence to specific formatting and requirements
π¬ Methodology: ICM + DPO
ICM (Internal Coherence Maximization)
ICM generates preference pairs without human annotation by:
- Creating diverse prompts across multiple domains
- Generating multiple responses per prompt
- Using systematic evaluation to rank responses
- Creating (prompt, chosen, rejected) preference pairs
DPO (Direct Preference Optimization)
DPO directly optimizes the model to:
- Increase probability of chosen responses
- Decrease probability of rejected responses
- Maintain similarity to reference model (KL constraint)
- Learn preferences without reward model training
π Expected Benefits
- β Enhanced Quality: Better responses across all task types
- β Label-Free Training: No manual preference annotation required
- β Comprehensive Coverage: All major model components enhanced
- β Full Integration: Complete embedding and head layer optimization
- β Reproducible: Standardized recipe from Ellora project
π‘ When to Use This Adapter vs Merged Model
Use this adapter when:
- β You want to combine with other adapters
- β You need the flexibility of PEFT loading/unloading
- β You want to fine-tune further on top of this enhancement
Use the merged model when:
- β You want maximum simplicity (no PEFT dependencies)
- β You need a standalone model for deployment
- β You want slightly smaller size (~540MB vs ~669MB)
π·οΈ Related Resources
- π Ellora Project: github.com/codelion/ellora
- π ICM Repository: github.com/codelion/icm
- π¦ Merged Model: codelion/gemma-3-270m-icm
- π Training Dataset: codelion/gemma-3-270m-icm-dpo
- π€ Base Model: google/gemma-3-270m-it
- π DPO Paper: Direct Preference Optimization
π‘ Innovation Summary
This recipe demonstrates how to enhance model capabilities comprehensively without any manual labeling:
- π― ICM generates diverse, high-quality preference pairs automatically
- β‘ DPO optimizes preferences directly without reward model complexity
- π§ Comprehensive LoRA maximizes enhancement while maintaining efficiency
- π Multi-domain training improves general capabilities, not just specific tasks
This adapter is part of the Ellora project - standardized recipes for enhancing LLM capabilities. Recipe #6 demonstrates label-free general enhancement via ICM + DPO.
- Downloads last month
- 180