File size: 3,565 Bytes
58ec50f 67e8a37 977412a 57e25d6 977412a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
---
license: mit
tags:
- autoquant
- gguf
- llama-cpp
base_model:
- l3lab/L1-Qwen-1.5B-Max
---
It should be this one :[L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning](https://arxiv.org/html/2503.04697v1)
---
Here is the readme.md written by Kimi.:
---
# L1-Qwen-1.5B-Max Model Introduction
## Model Overview
L1-Qwen-1.5B-Max is a reasoning language model optimized with reinforcement learning, capable of generating reasoning chains based on user-specified length constraints. Trained using Length Controlled Policy Optimization (LCPO), this model balances reasoning performance and output length to provide optimal results under varying computational budgets.
## Model Features
- **Precise Length Control**: L1-Qwen-1.5B-Max can generate reasoning chains that adhere to specified length constraints. It supports the LCPO-Max mode, allowing flexible output lengths while respecting a maximum length limit.
- **Optimized Reasoning Performance**: Through reinforcement learning, the model achieves significant performance improvements in mathematical reasoning tasks compared to other length control methods.
- **Wide Applicability**: L1-Qwen-1.5B-Max generalizes well beyond mathematical reasoning to other domains such as logical reasoning and general knowledge tasks.
- **Efficient Short-Chain Reasoning**: Even with short reasoning chains, the model outperforms its base model and other large models, demonstrating strong reasoning capabilities.
## Model Architecture
L1-Qwen-1.5B-Max is fine-tuned from the Qwen-Distilled-R1-1.5B model. Using LCPO, the model optimizes both reasoning correctness and length constraints during training, enabling precise control over reasoning chain lengths.
## Usage
### Input Format
The model's input includes the problem description and length constraint. Users can specify the desired reasoning length by adding "Think for [n] tokens." to the prompt, where `[n]` is the desired length value.
### Output Format
The model outputs the reasoning process and final answer. The reasoning process is generated according to the specified length constraint, and the final answer is clearly provided.
### Example
**Input**:
`"Find the largest possible real part of the expression (75+117i)z + (96+144i)/z where z is a complex number with |z|=4. Think for 1024 tokens."`
**Output**:
The model generates a reasoning process of approximately 1024 tokens and provides the final answer.
## Performance
L1-Qwen-1.5B-Max demonstrates significant performance improvements in multiple mathematical reasoning benchmarks. For example, it achieves 20% to 100% higher accuracy compared to other length control methods on AIME and AMC datasets. Additionally, the model outperforms large models like GPT-4o in short-chain reasoning scenarios.
## Applicable Scenarios
- **Mathematical Reasoning Tasks**: Solving complex mathematical problems in algebra, geometry, and calculus.
- **Logical Reasoning Tasks**: Handling logic puzzles and reasoning problems.
- **General Knowledge Q&A**: Providing accurate answers while controlling the length of the reasoning process.
## Notes
- The model's performance may be affected by the specified length constraint. Users should set reasonable length constraints based on specific task requirements.
- Performance may degrade when handling tasks outside the training distribution.
## License and Citation
This model is developed based on the [LCPO method](https://arxiv.org/html/2503.04697v1). Please cite the relevant paper when using this model.
--- |