nielsr HF staff commited on
Commit
a999984
·
verified ·
1 Parent(s): 9c8b836

Add pipeline tag and link to the code

Browse files

This PR makes sure people can find your model at https://huggingface.co/models?pipeline_tag=text-generation.

It also adds a link to the Github repository.

Files changed (1) hide show
  1. README.md +6 -26
README.md CHANGED
@@ -1,10 +1,11 @@
1
  ---
2
- library_name: transformers
3
- license: apache-2.0
4
- datasets:
5
- - DAMO-NLP-SG/Mistral-7B-LongPO-512K-tokenized
6
  base_model:
7
  - DAMO-NLP-SG/Mistral-7B-LongPO-128K
 
 
 
 
 
8
  ---
9
 
10
  # LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
@@ -13,7 +14,7 @@ This repo provides the checkpoint of Mistral-7B-LongPO-512K in our paper "LongPO
13
 
14
  (Note that it is an experimental an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.)
15
 
16
-
17
 
18
  <h5 align="left">
19
 
@@ -21,19 +22,14 @@ This repo provides the checkpoint of Mistral-7B-LongPO-512K in our paper "LongPO
21
  [![hf_paper](https://img.shields.io/badge/🤗-HF%20Daily-red.svg)](https://huggingface.co/papers/2502.13922)
22
  </h5>
23
 
24
-
25
-
26
  ## Highlights of LongPO
27
 
28
  - Self-evolving long-context alignment without human/superior LLMs annotations.
29
  - Extending context length while keeping aligned in one stage.
30
  - No degradation on short-context capabilities.
31
 
32
-
33
  <img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
34
 
35
-
36
-
37
  ## Models and Training Data
38
 
39
  | Models | Base Model | Training Data | # Data Samples |
@@ -45,10 +41,6 @@ This repo provides the checkpoint of Mistral-7B-LongPO-512K in our paper "LongPO
45
 
46
  \* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
47
 
48
-
49
-
50
-
51
-
52
  ## Training Process:
53
 
54
  1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
@@ -101,11 +93,8 @@ train/train_longpo.py \
101
 
102
  ## Evaluation
103
 
104
-
105
-
106
  ### InfiniteBench
107
 
108
-
109
  | Model | Train/Claimed Length | En.Sum | En.QA | En.MC | AVG. |
110
  | ---------------- | -------------------- | ------ | ------ | ------ | ------ |
111
  | GPT-4-128K | 128K | 14.73 | 22.44 | 67.25 | 34.81 |
@@ -128,10 +117,6 @@ train/train_longpo.py \
128
  - Our results are evaluated with greedy decoding.
129
  - Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
130
 
131
-
132
-
133
-
134
-
135
  ### RULER
136
 
137
  | Model | NIAH | VT | AGG | QA | AVG (13 tasks) |
@@ -143,10 +128,6 @@ train/train_longpo.py \
143
  | Mistral-7B-LongPO-256K-EXP | 96.80 | 97.00 | 69.14 | 64.87 | 87.65 |
144
  | Mistral-7B-LongPO-512K-EXP | 97.28 | 97.48 | 69.22 | 64.92 | 88.00 |
145
 
146
-
147
-
148
-
149
-
150
  ### Short Context
151
 
152
  | Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
@@ -158,7 +139,6 @@ train/train_longpo.py \
158
  | Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
159
  | Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
160
 
161
-
162
  ## Citation
163
  If you find our project useful, hope you can star our repo and cite our paper as follows:
164
  ```
 
1
  ---
 
 
 
 
2
  base_model:
3
  - DAMO-NLP-SG/Mistral-7B-LongPO-128K
4
+ datasets:
5
+ - DAMO-NLP-SG/Mistral-7B-LongPO-512K-tokenized
6
+ library_name: transformers
7
+ license: apache-2.0
8
+ pipeline_tag: text-generation
9
  ---
10
 
11
  # LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
 
14
 
15
  (Note that it is an experimental an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.)
16
 
17
+ Code: https://github.com/DAMO-NLP-SG/LongPO
18
 
19
  <h5 align="left">
20
 
 
22
  [![hf_paper](https://img.shields.io/badge/🤗-HF%20Daily-red.svg)](https://huggingface.co/papers/2502.13922)
23
  </h5>
24
 
 
 
25
  ## Highlights of LongPO
26
 
27
  - Self-evolving long-context alignment without human/superior LLMs annotations.
28
  - Extending context length while keeping aligned in one stage.
29
  - No degradation on short-context capabilities.
30
 
 
31
  <img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
32
 
 
 
33
  ## Models and Training Data
34
 
35
  | Models | Base Model | Training Data | # Data Samples |
 
41
 
42
  \* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
43
 
 
 
 
 
44
  ## Training Process:
45
 
46
  1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
 
93
 
94
  ## Evaluation
95
 
 
 
96
  ### InfiniteBench
97
 
 
98
  | Model | Train/Claimed Length | En.Sum | En.QA | En.MC | AVG. |
99
  | ---------------- | -------------------- | ------ | ------ | ------ | ------ |
100
  | GPT-4-128K | 128K | 14.73 | 22.44 | 67.25 | 34.81 |
 
117
  - Our results are evaluated with greedy decoding.
118
  - Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
119
 
 
 
 
 
120
  ### RULER
121
 
122
  | Model | NIAH | VT | AGG | QA | AVG (13 tasks) |
 
128
  | Mistral-7B-LongPO-256K-EXP | 96.80 | 97.00 | 69.14 | 64.87 | 87.65 |
129
  | Mistral-7B-LongPO-512K-EXP | 97.28 | 97.48 | 69.22 | 64.92 | 88.00 |
130
 
 
 
 
 
131
  ### Short Context
132
 
133
  | Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
 
139
  | Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
140
  | Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
141
 
 
142
  ## Citation
143
  If you find our project useful, hope you can star our repo and cite our paper as follows:
144
  ```