Add pipeline tag, library name and license

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +8 -9
README.md CHANGED
@@ -1,11 +1,15 @@
1
  ---
2
- language:
3
- - en
4
  base_model:
5
  - Qwen/Qwen2.5-Math-7B
 
 
6
  tags:
7
  - One-Shot-CFT
 
 
 
8
  ---
 
9
  # One-Shot-CFT: Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
10
 
11
  <p align="center">
@@ -16,15 +20,12 @@ tags:
16
  <a href="https://tiger-ai-lab.github.io/One-Shot-CFT/" target="_blank">🌐 Project Page</a>
17
  </p>
18
 
19
-
20
-
21
  ## 🧠 Overview
22
 
23
  One-Shot Critique Fine-Tuning (CFT) is a simple, robust, and compute-efficient training paradigm for unleashing the reasoning capabilities of pretrained LLMs in both mathematical and logical domains. By leveraging critiques on just one problem, One-Shot CFT enables models like Qwen and LLaMA to match or even outperform reinforcement learning, while using 20× less compute.
24
 
25
  Instead of learning from reference answers (as in supervised fine-tuning) or reward signals (as in reinforcement learning), One-Shot CFT enables models to learn from critiques of diverse solutions to a single problem, enhancing their exposure to varied reasoning patterns and mitigating overfitting. This exposes the LLMs to multiple perspectives and error types, thereby more effectively unleashing their reasoning potential.
26
 
27
-
28
  ## ✨ Key Highlights
29
 
30
  - **Unleashes Reasoning with One Example:** One-Shot CFT uses critiques of diverse model-generated solutions to a single problem to significantly boost performance across math and logic tasks. For example, with just 5 GPU hours of training on Qwen2.5-Math-7B, One-Shot CFT achieves an average improvement of +15% on six math benchmarks and +16% on three logic reasoning benchmarks.
@@ -33,7 +34,6 @@ Instead of learning from reference answers (as in supervised fine-tuning) or rew
33
 
34
  **This specific model is the One-Shot CFT variant trained based on [Qwen2.5-7B-Math](https://huggingface.co/Qwen/Qwen2.5-Math-7B) with [DSR-CFT-p0](https://huggingface.co/datasets/TIGER-Lab/One-Shot-CFT-Data) dataset.**
35
 
36
-
37
  ## Main Results
38
 
39
  <p align="center">
@@ -42,12 +42,11 @@ Instead of learning from reference answers (as in supervised fine-tuning) or rew
42
 
43
  <p align="center"><em>
44
  One-shot CFT consistently improves mathematical and logical reasoning.
45
- <strong>Left:</strong> Average accuracy on six mathematical reasoning benchmarks for Qwen and LLaMA models, comparing base, SFT, RLVR, and CFT with only one training example.
46
- <strong>Right:</strong> In-domain accuracy on three logic reasoning benchmarks (BBEH subtasks) for Qwen2.5-Math-7B.
47
  Across both domains, CFT with a single problem significantly outperforms standard SFT and matches or exceeds reinforcement learning with much lower compute.
48
  </em></p>
49
 
50
-
51
  ## Citation
52
 
53
  If you find our work helpful, please cite it as:
 
1
  ---
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-Math-7B
4
+ language:
5
+ - en
6
  tags:
7
  - One-Shot-CFT
8
+ pipeline_tag: text-generation
9
+ library_name: transformers
10
+ license: cc-by-4.0
11
  ---
12
+
13
  # One-Shot-CFT: Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
14
 
15
  <p align="center">
 
20
  <a href="https://tiger-ai-lab.github.io/One-Shot-CFT/" target="_blank">🌐 Project Page</a>
21
  </p>
22
 
 
 
23
  ## 🧠 Overview
24
 
25
  One-Shot Critique Fine-Tuning (CFT) is a simple, robust, and compute-efficient training paradigm for unleashing the reasoning capabilities of pretrained LLMs in both mathematical and logical domains. By leveraging critiques on just one problem, One-Shot CFT enables models like Qwen and LLaMA to match or even outperform reinforcement learning, while using 20× less compute.
26
 
27
  Instead of learning from reference answers (as in supervised fine-tuning) or reward signals (as in reinforcement learning), One-Shot CFT enables models to learn from critiques of diverse solutions to a single problem, enhancing their exposure to varied reasoning patterns and mitigating overfitting. This exposes the LLMs to multiple perspectives and error types, thereby more effectively unleashing their reasoning potential.
28
 
 
29
  ## ✨ Key Highlights
30
 
31
  - **Unleashes Reasoning with One Example:** One-Shot CFT uses critiques of diverse model-generated solutions to a single problem to significantly boost performance across math and logic tasks. For example, with just 5 GPU hours of training on Qwen2.5-Math-7B, One-Shot CFT achieves an average improvement of +15% on six math benchmarks and +16% on three logic reasoning benchmarks.
 
34
 
35
  **This specific model is the One-Shot CFT variant trained based on [Qwen2.5-7B-Math](https://huggingface.co/Qwen/Qwen2.5-Math-7B) with [DSR-CFT-p0](https://huggingface.co/datasets/TIGER-Lab/One-Shot-CFT-Data) dataset.**
36
 
 
37
  ## Main Results
38
 
39
  <p align="center">
 
42
 
43
  <p align="center"><em>
44
  One-shot CFT consistently improves mathematical and logical reasoning.
45
+ **Left:** Average accuracy on six mathematical reasoning benchmarks for Qwen and LLaMA models, comparing base, SFT, RLVR, and CFT with only one training example.
46
+ **Right:** In-domain accuracy on three logic reasoning benchmarks (BBEH subtasks) for Qwen2.5-Math-7B.
47
  Across both domains, CFT with a single problem significantly outperforms standard SFT and matches or exceeds reinforcement learning with much lower compute.
48
  </em></p>
49
 
 
50
  ## Citation
51
 
52
  If you find our work helpful, please cite it as: