---
library_name: transformers
datasets:
- BAAI/TACO
- tasksource/PRM800K
language:
- en
base_model:
- Qwen/Qwen2.5-32B-Instruct
- NovaSky-AI/Sky-T1-32B-Preview
license: apache-2.0
---

## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is a 32B reasoning model preference optimized on top of Sky-T1-32B-Preview to significantly reduce generation lengths while maintaining accuracy. The performance is on par with o1-preview model in both math and coding, while reducing generation lengths by up to 57% relative to Sky-T1-32B-Preview.
Please see our [blog post](https://novasky-ai.github.io/posts/reduce-overthinking/) for more details.

- **Developed by:** NovaSky Team from Sky Computing Lab at UC Berkeley.

## Training Details

### Training Data

10K preference pairs in math and coding domains, generated by Sky-T1-32B-Preview.

### Training Procedure
We perform Simple Policy Optimization (SimPO) with a batch size of 96, learning rate of 5e-7, gamma of 0.3, and beta of 2.0.

#### Speeds

We use Llama-Factory for training. On 8xH100, the SimPO training takes ~2.5 hours with DeepSpeed Zero-3 Offload.


## Evaluation
|              |         | Sky-T1-32B-Preview | Sky-T1-32B-Flash | Qwen2.5-32B-Instruct | QwQ-32B- Base | DeepSeek-R1-Distill-Qwen-32B |
|--------------|---------|:------------------:|:----------------:|:--------------------:|:-------------:|:----------------------------:|
|    Math500   |     Acc |        88.6        |       88.6       |         76.2         |      89.2     |             90.8             |
|              | Avg Len |        2124        |    1417 (-33%)   |          522         |      2089     |             2010             |
|    AIME24    |     Acc |        43.3        |       43.3       |         16.7         |       50      |             66.7             |
|              | Avg Len |        6881        |    4365 (-37%)   |          970         |      7379     |             9173             |
|   LCB Easy   |     Acc |        87.4        |        89        |         84.6         |      90.7     |             91.2             |
|              | Avg Len |        3415        |    2265 (-34%)   |          414         |      3255     |             2775             |
|  LCB Medium  |     Acc |        56.8        |       56.3       |         40.8         |      56.3     |             76.7             |
|              | Avg Len |        8263        |    4389 (-47%)   |          535         |      6742     |             6324             |
|   LCB Hard   |     Acc |        17.9        |       17.9       |          9.8         |      17.1     |             38.2             |
|              | Avg Len |        14564       |    6199 (-57%)   |          618         |     10450     |             10448            |
|     MMLU     |     Acc |        82.4        |       81.7       |         80.1         |      85.2     |             82.1             |
|              | Avg Len |        1087        |    799 (-17%)    |          312         |      1041     |              774             |
| GPQA Diamond |     Acc |        56.8        |       56.6       |         45.5         |      52.5     |             62.6             |
|              | Avg Len |        3503        |    2148 (-39%)   |          600         |      3302     |             5108             |

## Acknowledgement
We would like to thanks the compute resources from [Lambda Lab](https://lambdalabs.com/service/gpu-cloud?srsltid=AfmBOop5FnmEFTkavVtdZDsLWvHWNg6peXtat-OXJ9MW5GMNsk756PE5) and [AnyScale](https://www.anyscale.com/).

## Citation 
Please considering citing our blog post if you found it useful for your research. Thank you!

```bibtex
@misc{reduce_overthinking_2025,
  author       = {NovaSky Team},
  title        = {Think Less, Achieve More: Cut Reasoning Costs by 50% Without Sacrificing Accuracy},
  howpublished = {https://novasky-ai.github.io/posts/reduce-overthinking},
  note         = {Accessed: 2025-01-23},
  year         = {2025}
}