Add pipeline tag and link to the code

This PR makes sure people can find your model at https://huggingface.co/models?pipeline_tag=text-generation.

It also adds a link to the Github repository.

Files changed (1) hide show

README.md +6 -26

README.md CHANGED Viewed

@@ -1,10 +1,11 @@
 ---
-library_name: transformers
-license: apache-2.0
-datasets:
-- DAMO-NLP-SG/Mistral-7B-LongPO-512K-tokenized
 base_model:
 - DAMO-NLP-SG/Mistral-7B-LongPO-128K
 ---
 # LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
@@ -13,7 +14,7 @@ This repo provides the checkpoint of Mistral-7B-LongPO-512K in our paper "LongPO
 (Note that it is an experimental an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.)
 <h5 align="left">
@@ -21,19 +22,14 @@ This repo provides the checkpoint of Mistral-7B-LongPO-512K in our paper "LongPO
 [![hf_paper](https://img.shields.io/badge/🤗-HF%20Daily-red.svg)](https://huggingface.co/papers/2502.13922)
 </h5>
 ## Highlights of LongPO
 - Self-evolving long-context alignment without human/superior LLMs annotations.
 - Extending context length while keeping aligned in one stage.
 - No degradation on short-context capabilities.
 <img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
 ## Models and Training Data
 | Models                                                       | Base Model               | Training Data                                                | # Data Samples |
@@ -45,10 +41,6 @@ This repo provides the checkpoint of Mistral-7B-LongPO-512K in our paper "LongPO
 \* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
 ## Training Process:
 1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
@@ -101,11 +93,8 @@ train/train_longpo.py \
 ## Evaluation
 ### InfiniteBench
 | Model            | Train/Claimed Length | En.Sum | En.QA  | En.MC  | AVG.   |
 | ---------------- | -------------------- | ------ | ------ | ------ | ------ |
 | GPT-4-128K       | 128K                 | 14.73  | 22.44  | 67.25  | 34.81  |
@@ -128,10 +117,6 @@ train/train_longpo.py \
 - Our results are evaluated with greedy decoding.
 - Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
 ### RULER
 | Model                    | NIAH  | VT    | AGG   | QA    | AVG (13 tasks) |
@@ -143,10 +128,6 @@ train/train_longpo.py \
 | Mistral-7B-LongPO-256K-EXP   | 96.80 | 97.00 | 69.14 | 64.87 | 87.65          |
 | Mistral-7B-LongPO-512K-EXP   | 97.28 | 97.48 | 69.22 | 64.92 | 88.00          |
 ### Short Context
 | Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
@@ -158,7 +139,6 @@ train/train_longpo.py \
 | Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
 | Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
 ## Citation
 If you find our project useful, hope you can star our repo and cite our paper as follows:
 ```

 ---
 base_model:
 - DAMO-NLP-SG/Mistral-7B-LongPO-128K
+datasets:
+- DAMO-NLP-SG/Mistral-7B-LongPO-512K-tokenized
+library_name: transformers
+license: apache-2.0
+pipeline_tag: text-generation
 ---
 # LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
 (Note that it is an experimental an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.)
+Code: https://github.com/DAMO-NLP-SG/LongPO
 <h5 align="left">
 [![hf_paper](https://img.shields.io/badge/🤗-HF%20Daily-red.svg)](https://huggingface.co/papers/2502.13922)
 </h5>
 ## Highlights of LongPO
 - Self-evolving long-context alignment without human/superior LLMs annotations.
 - Extending context length while keeping aligned in one stage.
 - No degradation on short-context capabilities.
 <img width="1031" alt="image" src="https://github.com/user-attachments/assets/84f3c93f-909d-4ef7-a33a-107ca2deec42" />
 ## Models and Training Data
 | Models                                                       | Base Model               | Training Data                                                | # Data Samples |
 \* indicates an experimental version (for rebuttal purposes) that may have not been fully tuned or provided with sufficient data to achieve convergence.
 ## Training Process:
 1. Prompt a short-context instruct LLM (e.g., Mistral-7B-Instruct-v0.2) to self-generate short-to-long preference data as illustrated in [data_prepare](data_prepare/readme.md).
 ## Evaluation
 ### InfiniteBench
 | Model            | Train/Claimed Length | En.Sum | En.QA  | En.MC  | AVG.   |
 | ---------------- | -------------------- | ------ | ------ | ------ | ------ |
 | GPT-4-128K       | 128K                 | 14.73  | 22.44  | 67.25  | 34.81  |
 - Our results are evaluated with greedy decoding.
 - Baseline results marked with ᵇ are evaluated by us, while unmarked baseline results are sourced from their official report.
 ### RULER
 | Model                    | NIAH  | VT    | AGG   | QA    | AVG (13 tasks) |
 | Mistral-7B-LongPO-256K-EXP   | 96.80 | 97.00 | 69.14 | 64.87 | 87.65          |
 | Mistral-7B-LongPO-512K-EXP   | 97.28 | 97.48 | 69.22 | 64.92 | 88.00          |
 ### Short Context
 | Model | MMLU | ARC-C | Hellaswag | Winogrande | Avg |
 | Qwen2.5-7B-Instruct | 74.28 | 67.15 | 81.41 | 74.66 | 74.38 |
 | Qwen2.5-7B-LongPO-128K | 73.64 | 65.70 | 80.82 | 74.98 | 73.79 |
 ## Citation
 If you find our project useful, hope you can star our repo and cite our paper as follows:
 ```