# LightTransfer
--- # Model Card for Model ID ## Model Details: - **Base Model:** Qwen/Qwen2.5-32B-Instruct - **datasets:** RUC-AIBOX/long_form_thought_data_5k - **Training Framework:** Supervised Fine-tuning - **Parameters:** 32B - **Special Features:** - Replace 50% full attention layers with streaming attention --- ## Model Details **QwQ-LightTransfer** is a 32B-parameter model built on **Qwen/Qwen2.5-32B-Instruct** and fine-tuned via SFT on **RUC-AIBOX/long_form_thought_data_5k**. - By replacing 50% of the model’s full attention layers with streaming attention,specifically layers [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 30, 31, 32, 33, 35, 37, 38, 43, 51], it substantially reduces memory costs. - QwQ-LightTransfer scores 53.3% on the advanced math benchmark AIME24, demonstrating its strong o1-like long reasoning capabilities. ## Performance Evaluation We have evaluated QwQ-LightTransfer on several long reasoning generation benchmarks. Some of the evaluation results are shown in the table below. | Method | Math-OAI | AIME24 | AIME25 | GSM8K | |---------------|---------|--------|--------|-------| | o1-preview | 85.5 | 44.6 | - | - | | OwO-STILL | 90.2 | 46.7 | 33.3 | 95.6 | | LongGen | 78.2 | 16.7 | - | 95.4 | | LightTransfer | 90.7 | 53.3 | 40.0 | 95.5 | ## Import from Transformers To load the QwQ-LightTransfer model using Transformers, use the following code: ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_name = 'QwQ-32B-LightTransfer' tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype=torch.bfloat16,trust_remote_code=True,device_map='auto') text = "Hi, I'm QwQ-32B-LightTransfer." inputs = tokenizer(text, return_tensors='pt').to(model.device) with torch.no_grad(): outputs = model.generate(inputs['input_ids'],max_gen_len=32000) print(tokenizer.decode(outputs[0])) ``` ## Citation ``` @misc{zhang2025lighttransferlongcontextllmsecretly, title={LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation}, author={Xuan Zhang and Fengzhuo Zhang and Cunxiao Du and Chao Du and Tianyu Pang and Wei Gao and Min Lin}, year={2025}, eprint={2410.13846}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2410.13846}, } ```