Add pipeline tag and link to the code
#1
by
nielsr
HF staff
- opened
README.md
CHANGED
@@ -1,10 +1,11 @@
|
|
1 |
---
|
2 |
-
library_name: transformers
|
3 |
-
license: apache-2.0
|
4 |
-
datasets:
|
5 |
-
- DAMO-NLP-SG/Mistral-7B-LongPO-512K-tokenized
|
6 |
base_model:
|
7 |
- DAMO-NLP-SG/Mistral-7B-LongPO-128K
|
|
|
|
|
|
|
|
|
|
|
8 |
---
|
9 |
|
10 |
# LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
|
@@ -13,7 +14,7 @@ This repo provides the checkpoint of Mistral-7B-LongPO-512K in our paper "LongPO
|
|
13 |
|
14 |
(Note that it is an experimental an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.)
|
15 |
|
16 |
-
|
17 |
|
18 |
<h5 align="left">
|
19 |
|
@@ -21,19 +22,14 @@ This repo provides the checkpoint of Mistral-7B-LongPO-512K in our paper "LongPO
|
|
21 |
[](https://huggingface.co/papers/2502.13922)
|
22 |
</h5>
|
23 |
|
24 |
-
|
25 |
-
|
26 |
## Highlights of LongPO
|
27 |
|
28 |
- Self-evolving long-context alignment without human/superior LLMs annotations.
|
29 |
- Extending context length while keeping aligned in one stage.
|
30 |
- No degradation on short-context capabilities.
|
31 |
|
32 |
-
|
33 |
<img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
|
34 |
|
35 |
-
|
36 |
-
|
37 |
## Models and Training Data
|
38 |
|
39 |
| Models | Base Model | Training Data | # Data Samples |
|
@@ -45,10 +41,6 @@ This repo provides the checkpoint of Mistral-7B-LongPO-512K in our paper "LongPO
|
|
45 |
|
46 |
\* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
|
47 |
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
## Training Process:
|
53 |
|
54 |
1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
|
@@ -101,11 +93,8 @@ train/train_longpo.py \
|
|
101 |
|
102 |
## Evaluation
|
103 |
|
104 |
-
|
105 |
-
|
106 |
### InfiniteBench
|
107 |
|
108 |
-
|
109 |
| Model | Train/Claimed Length | En.Sum | En.QA | En.MC | AVG. |
|
110 |
| ---------------- | -------------------- | ------ | ------ | ------ | ------ |
|
111 |
| GPT-4-128K | 128K | 14.73 | 22.44 | 67.25 | 34.81 |
|
@@ -128,10 +117,6 @@ train/train_longpo.py \
|
|
128 |
- Our results are evaluated with greedy decoding.
|
129 |
- Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
|
130 |
|
131 |
-
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
### RULER
|
136 |
|
137 |
| Model | NIAH | VT | AGG | QA | AVG (13 tasks) |
|
@@ -143,10 +128,6 @@ train/train_longpo.py \
|
|
143 |
| Mistral-7B-LongPO-256K-EXP | 96.80 | 97.00 | 69.14 | 64.87 | 87.65 |
|
144 |
| Mistral-7B-LongPO-512K-EXP | 97.28 | 97.48 | 69.22 | 64.92 | 88.00 |
|
145 |
|
146 |
-
|
147 |
-
|
148 |
-
|
149 |
-
|
150 |
### Short Context
|
151 |
|
152 |
| Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
|
@@ -158,7 +139,6 @@ train/train_longpo.py \
|
|
158 |
| Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
|
159 |
| Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
|
160 |
|
161 |
-
|
162 |
## Citation
|
163 |
If you find our project useful, hope you can star our repo and cite our paper as follows:
|
164 |
```
|
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
base_model:
|
3 |
- DAMO-NLP-SG/Mistral-7B-LongPO-128K
|
4 |
+
datasets:
|
5 |
+
- DAMO-NLP-SG/Mistral-7B-LongPO-512K-tokenized
|
6 |
+
library_name: transformers
|
7 |
+
license: apache-2.0
|
8 |
+
pipeline_tag: text-generation
|
9 |
---
|
10 |
|
11 |
# LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
|
|
|
14 |
|
15 |
(Note that it is an experimental an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.)
|
16 |
|
17 |
+
Code: https://github.com/DAMO-NLP-SG/LongPO
|
18 |
|
19 |
<h5 align="left">
|
20 |
|
|
|
22 |
[](https://huggingface.co/papers/2502.13922)
|
23 |
</h5>
|
24 |
|
|
|
|
|
25 |
## Highlights of LongPO
|
26 |
|
27 |
- Self-evolving long-context alignment without human/superior LLMs annotations.
|
28 |
- Extending context length while keeping aligned in one stage.
|
29 |
- No degradation on short-context capabilities.
|
30 |
|
|
|
31 |
<img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
|
32 |
|
|
|
|
|
33 |
## Models and Training Data
|
34 |
|
35 |
| Models | Base Model | Training Data | # Data Samples |
|
|
|
41 |
|
42 |
\* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
|
43 |
|
|
|
|
|
|
|
|
|
44 |
## Training Process:
|
45 |
|
46 |
1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
|
|
|
93 |
|
94 |
## Evaluation
|
95 |
|
|
|
|
|
96 |
### InfiniteBench
|
97 |
|
|
|
98 |
| Model | Train/Claimed Length | En.Sum | En.QA | En.MC | AVG. |
|
99 |
| ---------------- | -------------------- | ------ | ------ | ------ | ------ |
|
100 |
| GPT-4-128K | 128K | 14.73 | 22.44 | 67.25 | 34.81 |
|
|
|
117 |
- Our results are evaluated with greedy decoding.
|
118 |
- Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
|
119 |
|
|
|
|
|
|
|
|
|
120 |
### RULER
|
121 |
|
122 |
| Model | NIAH | VT | AGG | QA | AVG (13 tasks) |
|
|
|
128 |
| Mistral-7B-LongPO-256K-EXP | 96.80 | 97.00 | 69.14 | 64.87 | 87.65 |
|
129 |
| Mistral-7B-LongPO-512K-EXP | 97.28 | 97.48 | 69.22 | 64.92 | 88.00 |
|
130 |
|
|
|
|
|
|
|
|
|
131 |
### Short Context
|
132 |
|
133 |
| Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
|
|
|
139 |
| Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
|
140 |
| Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
|
141 |
|
|
|
142 |
## Citation
|
143 |
If you find our project useful, hope you can star our repo and cite our paper as follows:
|
144 |
```
|