General Model Enhancement via ICM-DPO with Comprehensive LoRA

πŸš€ Overview

This model demonstrates comprehensive capability enhancement using ICM-generated preferences and high-capacity LoRA training via Direct Preference Optimization (DPO). This is Recipe #6 from the Ellora project - a collection of standardized recipes for enhancing LLM capabilities.

Note: This adapter includes full embedding and language modeling head layers (embed_tokens and lm_head), making it ~669MB rather than a typical lightweight LoRA adapter. This provides better performance at the cost of size.

πŸ”§ Key Features

  • 🎯 Comprehensive LoRA: Targets all major linear layers with rank 32 for maximum capacity enhancement
  • πŸ“Š ICM-Generated Preferences: Uses Internal Coherence Maximization for completely label-free preference data generation
  • ⚑ DPO Training: Direct preference optimization without requiring a separate reward model
  • 🌐 General Purpose: Enhances capabilities across diverse tasks (reasoning, coding, creative writing, etc.)
  • πŸ’Ύ Full Layer Integration: Includes complete embedding and head layers for optimal performance

πŸ“Š Model Configuration

  • Base Model: google/gemma-3-270m-it
  • LoRA Rank: 32
  • LoRA Alpha: 64
  • Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Modules to Save: embed_tokens, lm_head (full layers)
  • Training Method: Direct Preference Optimization (DPO)
  • Beta (KL Penalty): 0.05
  • Adapter Size: ~669MB (includes full embedding/head layers)
  • Trainable Parameters: ~56.13838755173775% of base model

πŸ“ˆ Training Details

Dataset

  • Source: codelion/gemma-3-270m-icm-dpo
  • Method: ICM (Internal Coherence Maximization) for label-free preference generation
  • Training Samples: 1060
  • Evaluation Samples: 50

Training Configuration

  • Epochs: 2
  • Batch Size: 4 (per device)
  • Gradient Accumulation: 2 steps
  • Effective Batch Size: 8
  • Learning Rate: 2e-06
  • Optimizer: paged_adamw_8bit
  • Memory Optimization: BF16, Gradient Checkpointing

πŸ”§ Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-3-270m-it",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m-it")

# Load the enhanced model
model = PeftModel.from_pretrained(base_model, "codelion/gemma-3-270m-icm-dpo-lora")

# Generate enhanced responses
prompt = "Explain quantum computing in simple terms."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

🎯 Capabilities Enhanced

This model shows improvements across multiple domains:

  • 🧠 Reasoning: Logical thinking, mathematical problem solving
  • ✍️ Creative Writing: Story generation, poetry, descriptive text
  • πŸ’» Code Generation: Python, JavaScript, SQL code creation
  • ❓ Question Answering: Factual responses, explanations
  • πŸ”§ Problem Solving: Step-by-step solutions, systematic thinking
  • πŸ“‹ Instruction Following: Adherence to specific formatting and requirements

πŸ”¬ Methodology: ICM + DPO

ICM (Internal Coherence Maximization)

ICM generates preference pairs without human annotation by:

  1. Creating diverse prompts across multiple domains
  2. Generating multiple responses per prompt
  3. Using systematic evaluation to rank responses
  4. Creating (prompt, chosen, rejected) preference pairs

DPO (Direct Preference Optimization)

DPO directly optimizes the model to:

  1. Increase probability of chosen responses
  2. Decrease probability of rejected responses
  3. Maintain similarity to reference model (KL constraint)
  4. Learn preferences without reward model training

πŸ“Š Expected Benefits

  • βœ… Enhanced Quality: Better responses across all task types
  • βœ… Label-Free Training: No manual preference annotation required
  • βœ… Comprehensive Coverage: All major model components enhanced
  • βœ… Full Integration: Complete embedding and head layer optimization
  • βœ… Reproducible: Standardized recipe from Ellora project

πŸ’‘ When to Use This Adapter vs Merged Model

Use this adapter when:

  • βœ… You want to combine with other adapters
  • βœ… You need the flexibility of PEFT loading/unloading
  • βœ… You want to fine-tune further on top of this enhancement

Use the merged model when:

  • βœ… You want maximum simplicity (no PEFT dependencies)
  • βœ… You need a standalone model for deployment
  • βœ… You want slightly smaller size (~540MB vs ~669MB)

🏷️ Related Resources

πŸ’‘ Innovation Summary

This recipe demonstrates how to enhance model capabilities comprehensively without any manual labeling:

  1. 🎯 ICM generates diverse, high-quality preference pairs automatically
  2. ⚑ DPO optimizes preferences directly without reward model complexity
  3. πŸ”§ Comprehensive LoRA maximizes enhancement while maintaining efficiency
  4. 🌐 Multi-domain training improves general capabilities, not just specific tasks

This adapter is part of the Ellora project - standardized recipes for enhancing LLM capabilities. Recipe #6 demonstrates label-free general enhancement via ICM + DPO.

Downloads last month
180
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for codelion/gemma-3-270m-icm-dpo-lora

Adapter
(12)
this model

Dataset used to train codelion/gemma-3-270m-icm-dpo-lora