codelion commited on
Commit
def90b4
Β·
verified Β·
1 Parent(s): bdb2b66

Upload ICM-DPO enhanced Gemma PEFT adapter with comprehensive LoRA and model card

Browse files
Files changed (3) hide show
  1. README.md +20 -3
  2. adapter_model.safetensors +1 -1
  3. results.json +4 -4
README.md CHANGED
@@ -23,13 +23,15 @@ model_type: gemma
23
 
24
  This model demonstrates comprehensive capability enhancement using ICM-generated preferences and high-capacity LoRA training via Direct Preference Optimization (DPO). This is **Recipe #6** from the [Ellora project](https://github.com/codelion/ellora) - a collection of standardized recipes for enhancing LLM capabilities.
25
 
 
 
26
  ## πŸ”§ Key Features
27
 
28
  - **🎯 Comprehensive LoRA**: Targets all major linear layers with rank 32 for maximum capacity enhancement
29
  - **πŸ“Š ICM-Generated Preferences**: Uses [Internal Coherence Maximization](https://github.com/codelion/icm) for completely label-free preference data generation
30
  - **⚑ DPO Training**: Direct preference optimization without requiring a separate reward model
31
  - **🌐 General Purpose**: Enhances capabilities across diverse tasks (reasoning, coding, creative writing, etc.)
32
- - **πŸ’Ύ Memory Efficient**: Uses gradient checkpointing and 8-bit optimizer for efficient training
33
 
34
  ## πŸ“Š Model Configuration
35
 
@@ -37,8 +39,10 @@ This model demonstrates comprehensive capability enhancement using ICM-generated
37
  - **LoRA Rank**: 32
38
  - **LoRA Alpha**: 64
39
  - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
 
40
  - **Training Method**: Direct Preference Optimization (DPO)
41
  - **Beta (KL Penalty)**: 0.5
 
42
  - **Trainable Parameters**: ~56.13838755173775% of base model
43
 
44
  ## πŸ“ˆ Training Details
@@ -46,7 +50,7 @@ This model demonstrates comprehensive capability enhancement using ICM-generated
46
  ### Dataset
47
  - **Source**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
48
  - **Method**: ICM (Internal Coherence Maximization) for label-free preference generation
49
- - **Training Samples**: 46044
50
  - **Evaluation Samples**: 50
51
 
52
  ### Training Configuration
@@ -116,13 +120,26 @@ DPO directly optimizes the model to:
116
  - βœ… **Enhanced Quality**: Better responses across all task types
117
  - βœ… **Label-Free Training**: No manual preference annotation required
118
  - βœ… **Comprehensive Coverage**: All major model components enhanced
119
- - βœ… **Memory Efficient**: ~56.13838755173775% trainable parameters vs full fine-tuning
120
  - βœ… **Reproducible**: Standardized recipe from Ellora project
121
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  ## 🏷️ Related Resources
123
 
124
  - **πŸ“š Ellora Project**: [github.com/codelion/ellora](https://github.com/codelion/ellora)
125
  - **πŸ”„ ICM Repository**: [github.com/codelion/icm](https://github.com/codelion/icm)
 
126
  - **πŸ“Š Training Dataset**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
127
  - **πŸ€– Base Model**: [google/gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it)
128
  - **πŸ“„ DPO Paper**: [Direct Preference Optimization](https://arxiv.org/abs/2305.18290)
 
23
 
24
  This model demonstrates comprehensive capability enhancement using ICM-generated preferences and high-capacity LoRA training via Direct Preference Optimization (DPO). This is **Recipe #6** from the [Ellora project](https://github.com/codelion/ellora) - a collection of standardized recipes for enhancing LLM capabilities.
25
 
26
+ **Note**: This adapter includes full embedding and language modeling head layers (`embed_tokens` and `lm_head`), making it ~669MB rather than a typical lightweight LoRA adapter. This provides better performance at the cost of size.
27
+
28
  ## πŸ”§ Key Features
29
 
30
  - **🎯 Comprehensive LoRA**: Targets all major linear layers with rank 32 for maximum capacity enhancement
31
  - **πŸ“Š ICM-Generated Preferences**: Uses [Internal Coherence Maximization](https://github.com/codelion/icm) for completely label-free preference data generation
32
  - **⚑ DPO Training**: Direct preference optimization without requiring a separate reward model
33
  - **🌐 General Purpose**: Enhances capabilities across diverse tasks (reasoning, coding, creative writing, etc.)
34
+ - **πŸ’Ύ Full Layer Integration**: Includes complete embedding and head layers for optimal performance
35
 
36
  ## πŸ“Š Model Configuration
37
 
 
39
  - **LoRA Rank**: 32
40
  - **LoRA Alpha**: 64
41
  - **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
42
+ - **Modules to Save**: embed_tokens, lm_head (full layers)
43
  - **Training Method**: Direct Preference Optimization (DPO)
44
  - **Beta (KL Penalty)**: 0.5
45
+ - **Adapter Size**: ~669MB (includes full embedding/head layers)
46
  - **Trainable Parameters**: ~56.13838755173775% of base model
47
 
48
  ## πŸ“ˆ Training Details
 
50
  ### Dataset
51
  - **Source**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
52
  - **Method**: ICM (Internal Coherence Maximization) for label-free preference generation
53
+ - **Training Samples**: 44286
54
  - **Evaluation Samples**: 50
55
 
56
  ### Training Configuration
 
120
  - βœ… **Enhanced Quality**: Better responses across all task types
121
  - βœ… **Label-Free Training**: No manual preference annotation required
122
  - βœ… **Comprehensive Coverage**: All major model components enhanced
123
+ - βœ… **Full Integration**: Complete embedding and head layer optimization
124
  - βœ… **Reproducible**: Standardized recipe from Ellora project
125
 
126
+ ## πŸ’‘ When to Use This Adapter vs Merged Model
127
+
128
+ **Use this adapter when:**
129
+ - βœ… You want to combine with other adapters
130
+ - βœ… You need the flexibility of PEFT loading/unloading
131
+ - βœ… You want to fine-tune further on top of this enhancement
132
+
133
+ **Use the merged model when:**
134
+ - βœ… You want maximum simplicity (no PEFT dependencies)
135
+ - βœ… You need a standalone model for deployment
136
+ - βœ… You want slightly smaller size (~540MB vs ~669MB)
137
+
138
  ## 🏷️ Related Resources
139
 
140
  - **πŸ“š Ellora Project**: [github.com/codelion/ellora](https://github.com/codelion/ellora)
141
  - **πŸ”„ ICM Repository**: [github.com/codelion/icm](https://github.com/codelion/icm)
142
+ - **πŸ“¦ Merged Model**: [codelion/gemma-3-270m-icm](https://huggingface.co/codelion/gemma-3-270m-icm)
143
  - **πŸ“Š Training Dataset**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
144
  - **πŸ€– Base Model**: [google/gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it)
145
  - **πŸ“„ DPO Paper**: [Direct Preference Optimization](https://arxiv.org/abs/2305.18290)
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:2887a89f9e09d88ba2a19f8474de58f1f06923e905147be675f95c8ab689844d
3
  size 701497992
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9fc6d5e641a8697f452f3a6bcdf9ca1948a2b89350190f029b674471b3d9649c
3
  size 701497992
results.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
  "training_metrics": {
3
- "train_runtime": 19296.9327,
4
- "train_samples_per_second": 7.158,
5
- "train_steps_per_second": 0.447,
6
  "total_flos": 0.0,
7
- "train_loss": 0.6712451833029547,
8
  "epoch": 3.0
9
  },
10
  "config": {
 
1
  {
2
  "training_metrics": {
3
+ "train_runtime": 22974.7335,
4
+ "train_samples_per_second": 5.783,
5
+ "train_steps_per_second": 0.361,
6
  "total_flos": 0.0,
7
+ "train_loss": 0.2695974061958248,
8
  "epoch": 3.0
9
  },
10
  "config": {