Upload ICM-DPO enhanced Gemma PEFT adapter with comprehensive LoRA and model card
Browse files- README.md +20 -3
- adapter_model.safetensors +1 -1
- results.json +4 -4
README.md
CHANGED
@@ -23,13 +23,15 @@ model_type: gemma
|
|
23 |
|
24 |
This model demonstrates comprehensive capability enhancement using ICM-generated preferences and high-capacity LoRA training via Direct Preference Optimization (DPO). This is **Recipe #6** from the [Ellora project](https://github.com/codelion/ellora) - a collection of standardized recipes for enhancing LLM capabilities.
|
25 |
|
|
|
|
|
26 |
## π§ Key Features
|
27 |
|
28 |
- **π― Comprehensive LoRA**: Targets all major linear layers with rank 32 for maximum capacity enhancement
|
29 |
- **π ICM-Generated Preferences**: Uses [Internal Coherence Maximization](https://github.com/codelion/icm) for completely label-free preference data generation
|
30 |
- **β‘ DPO Training**: Direct preference optimization without requiring a separate reward model
|
31 |
- **π General Purpose**: Enhances capabilities across diverse tasks (reasoning, coding, creative writing, etc.)
|
32 |
-
- **πΎ
|
33 |
|
34 |
## π Model Configuration
|
35 |
|
@@ -37,8 +39,10 @@ This model demonstrates comprehensive capability enhancement using ICM-generated
|
|
37 |
- **LoRA Rank**: 32
|
38 |
- **LoRA Alpha**: 64
|
39 |
- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
|
|
|
40 |
- **Training Method**: Direct Preference Optimization (DPO)
|
41 |
- **Beta (KL Penalty)**: 0.5
|
|
|
42 |
- **Trainable Parameters**: ~56.13838755173775% of base model
|
43 |
|
44 |
## π Training Details
|
@@ -46,7 +50,7 @@ This model demonstrates comprehensive capability enhancement using ICM-generated
|
|
46 |
### Dataset
|
47 |
- **Source**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
|
48 |
- **Method**: ICM (Internal Coherence Maximization) for label-free preference generation
|
49 |
-
- **Training Samples**:
|
50 |
- **Evaluation Samples**: 50
|
51 |
|
52 |
### Training Configuration
|
@@ -116,13 +120,26 @@ DPO directly optimizes the model to:
|
|
116 |
- β
**Enhanced Quality**: Better responses across all task types
|
117 |
- β
**Label-Free Training**: No manual preference annotation required
|
118 |
- β
**Comprehensive Coverage**: All major model components enhanced
|
119 |
-
- β
**
|
120 |
- β
**Reproducible**: Standardized recipe from Ellora project
|
121 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
122 |
## π·οΈ Related Resources
|
123 |
|
124 |
- **π Ellora Project**: [github.com/codelion/ellora](https://github.com/codelion/ellora)
|
125 |
- **π ICM Repository**: [github.com/codelion/icm](https://github.com/codelion/icm)
|
|
|
126 |
- **π Training Dataset**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
|
127 |
- **π€ Base Model**: [google/gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it)
|
128 |
- **π DPO Paper**: [Direct Preference Optimization](https://arxiv.org/abs/2305.18290)
|
|
|
23 |
|
24 |
This model demonstrates comprehensive capability enhancement using ICM-generated preferences and high-capacity LoRA training via Direct Preference Optimization (DPO). This is **Recipe #6** from the [Ellora project](https://github.com/codelion/ellora) - a collection of standardized recipes for enhancing LLM capabilities.
|
25 |
|
26 |
+
**Note**: This adapter includes full embedding and language modeling head layers (`embed_tokens` and `lm_head`), making it ~669MB rather than a typical lightweight LoRA adapter. This provides better performance at the cost of size.
|
27 |
+
|
28 |
## π§ Key Features
|
29 |
|
30 |
- **π― Comprehensive LoRA**: Targets all major linear layers with rank 32 for maximum capacity enhancement
|
31 |
- **π ICM-Generated Preferences**: Uses [Internal Coherence Maximization](https://github.com/codelion/icm) for completely label-free preference data generation
|
32 |
- **β‘ DPO Training**: Direct preference optimization without requiring a separate reward model
|
33 |
- **π General Purpose**: Enhances capabilities across diverse tasks (reasoning, coding, creative writing, etc.)
|
34 |
+
- **πΎ Full Layer Integration**: Includes complete embedding and head layers for optimal performance
|
35 |
|
36 |
## π Model Configuration
|
37 |
|
|
|
39 |
- **LoRA Rank**: 32
|
40 |
- **LoRA Alpha**: 64
|
41 |
- **Target Modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
|
42 |
+
- **Modules to Save**: embed_tokens, lm_head (full layers)
|
43 |
- **Training Method**: Direct Preference Optimization (DPO)
|
44 |
- **Beta (KL Penalty)**: 0.5
|
45 |
+
- **Adapter Size**: ~669MB (includes full embedding/head layers)
|
46 |
- **Trainable Parameters**: ~56.13838755173775% of base model
|
47 |
|
48 |
## π Training Details
|
|
|
50 |
### Dataset
|
51 |
- **Source**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
|
52 |
- **Method**: ICM (Internal Coherence Maximization) for label-free preference generation
|
53 |
+
- **Training Samples**: 44286
|
54 |
- **Evaluation Samples**: 50
|
55 |
|
56 |
### Training Configuration
|
|
|
120 |
- β
**Enhanced Quality**: Better responses across all task types
|
121 |
- β
**Label-Free Training**: No manual preference annotation required
|
122 |
- β
**Comprehensive Coverage**: All major model components enhanced
|
123 |
+
- β
**Full Integration**: Complete embedding and head layer optimization
|
124 |
- β
**Reproducible**: Standardized recipe from Ellora project
|
125 |
|
126 |
+
## π‘ When to Use This Adapter vs Merged Model
|
127 |
+
|
128 |
+
**Use this adapter when:**
|
129 |
+
- β
You want to combine with other adapters
|
130 |
+
- β
You need the flexibility of PEFT loading/unloading
|
131 |
+
- β
You want to fine-tune further on top of this enhancement
|
132 |
+
|
133 |
+
**Use the merged model when:**
|
134 |
+
- β
You want maximum simplicity (no PEFT dependencies)
|
135 |
+
- β
You need a standalone model for deployment
|
136 |
+
- β
You want slightly smaller size (~540MB vs ~669MB)
|
137 |
+
|
138 |
## π·οΈ Related Resources
|
139 |
|
140 |
- **π Ellora Project**: [github.com/codelion/ellora](https://github.com/codelion/ellora)
|
141 |
- **π ICM Repository**: [github.com/codelion/icm](https://github.com/codelion/icm)
|
142 |
+
- **π¦ Merged Model**: [codelion/gemma-3-270m-icm](https://huggingface.co/codelion/gemma-3-270m-icm)
|
143 |
- **π Training Dataset**: [codelion/gemma-3-270m-icm-dpo](https://huggingface.co/datasets/codelion/gemma-3-270m-icm-dpo)
|
144 |
- **π€ Base Model**: [google/gemma-3-270m-it](https://huggingface.co/google/gemma-3-270m-it)
|
145 |
- **π DPO Paper**: [Direct Preference Optimization](https://arxiv.org/abs/2305.18290)
|
adapter_model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 701497992
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9fc6d5e641a8697f452f3a6bcdf9ca1948a2b89350190f029b674471b3d9649c
|
3 |
size 701497992
|
results.json
CHANGED
@@ -1,10 +1,10 @@
|
|
1 |
{
|
2 |
"training_metrics": {
|
3 |
-
"train_runtime":
|
4 |
-
"train_samples_per_second":
|
5 |
-
"train_steps_per_second": 0.
|
6 |
"total_flos": 0.0,
|
7 |
-
"train_loss": 0.
|
8 |
"epoch": 3.0
|
9 |
},
|
10 |
"config": {
|
|
|
1 |
{
|
2 |
"training_metrics": {
|
3 |
+
"train_runtime": 22974.7335,
|
4 |
+
"train_samples_per_second": 5.783,
|
5 |
+
"train_steps_per_second": 0.361,
|
6 |
"total_flos": 0.0,
|
7 |
+
"train_loss": 0.2695974061958248,
|
8 |
"epoch": 3.0
|
9 |
},
|
10 |
"config": {
|