Text Generation
Transformers
Safetensors
English
qwen2
conversational
text-generation-inference
File size: 3,784 Bytes
ba51c4d
 
face8cb
 
 
 
 
 
ba51c4d
 
face8cb
ba51c4d
face8cb
ba51c4d
 
 
 
 
 
face8cb
ba51c4d
face8cb
 
 
 
 
ba51c4d
face8cb
ba51c4d
face8cb
 
ba51c4d
 
 
 
 
 
face8cb
ba51c4d
face8cb
ba51c4d
face8cb
ba51c4d
 
 
face8cb
ba51c4d
 
 
 
face8cb
ba51c4d
 
 
face8cb
ba51c4d
 
 
 
face8cb
ba51c4d
 
 
 
 
 
face8cb
ba51c4d
 
 
face8cb
ba51c4d
 
 
face8cb
ba51c4d
face8cb
ba51c4d
face8cb
ba51c4d
 
 
 
 
 
 
 
face8cb
ba51c4d
 
 
face8cb
ba51c4d
 
 
face8cb
ba51c4d
 
 
face8cb
ba51c4d
 
 
face8cb
ba51c4d
 
face8cb
ba51c4d
 
 
face8cb
 
 
 
 
 
 
 
 
 
 
ba51c4d
 
 
face8cb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
library_name: transformers
pipeline_tag: text-generation
license: apache-2.0
tags:
  - self-calibration
  - confidence-estimation
  - test-time-scaling
---

# Model Card for Efficient Test-Time Scaling via Self-Calibration

This model implements an efficient test-time scaling method using model confidence for dynamic sampling adjustment. Higher confidence responses have a greater influence on the final answer, leading to improved computational efficiency. The model uses a self-calibration framework to generate more calibrated confidence scores.


## Model Details

### Model Description

This model utilizes a self-calibration framework to generate calibrated confidence scores, which are then used to improve the efficiency of test-time scaling methods. This allows for comparable performance with substantially fewer computational resources.

- **Developed by:** HINT-lab
- **Model type:** Large Language Model
- **Language(s) (NLP):** English
- **License:** Apache-2.0
- **Finetuned from model [optional]:** [Specify base model here, e.g.,  `meta-llama/Llama-3.1-8B-Instruct`]

### Model Sources

- **Repository:** https://github.com/HINT-lab/Efficient-Test-Time-Scaling
- **Paper:** [Efficient Test-Time Scaling via Self-Calibration](https://arxiv.org/abs/2503.00031)


## Uses

### Direct Use

The model can be used directly for text generation tasks, leveraging its self-calibration capabilities for improved efficiency.

### Downstream Use

The calibrated confidence scores generated by the model can be incorporated into various test-time scaling methods (e.g., Self-Consistency, Best-of-N) to enhance their performance and reduce computational costs.

### Out-of-Scope Use

The model is not intended for tasks requiring high accuracy in scenarios where confidence calibration is not crucial.


## Bias, Risks, and Limitations

The model's performance and calibration accuracy may vary depending on the specific dataset and task. Like other LLMs, it may exhibit biases present in its training data.

### Recommendations

Users should be aware of the potential biases and limitations of the model and carefully evaluate its performance on their specific tasks. Further investigation into bias mitigation techniques is recommended.


## How to Get Started with the Model

See the [GitHub README](https://github.com/HINT-lab/Efficient-Test-Time-Scaling) for detailed instructions.


## Training Details

### Training Data

[Link to Hugging Face Dataset, if available. Otherwise, provide a brief description]

### Training Procedure

See the training section in the [GitHub README](https://github.com/HINT-lab/Efficient-Test-Time-Scaling).

#### Training Hyperparameters

[Information from GitHub README regarding hyperparameters]

#### Speeds, Sizes, Times

[Information from GitHub README about training times, model sizes, etc.]


## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

[Link to Hugging Face Dataset, if available. Otherwise, provide a brief description]

#### Factors

[List factors from GitHub README, e.g., different datasets]

#### Metrics

[List metrics used from GitHub README, e.g., accuracy]

### Results

[Summary of results from GitHub README]

#### Summary

[Concise summary of overall evaluation performance]


## Citation

**BibTeX:**

```bibtex
@misc{huang2025efficienttesttimescalingselfcalibration,
      title={Efficient Test-Time Scaling via Self-Calibration}, 
      author={Chengsong Huang and Langlin Huang and Jixuan Leng and Jiacheng Liu and Jiaxin Huang},
      year={2025},
      eprint={2503.00031},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2503.00031}, 
}
```

**APA:**

[APA citation for the paper - Needs to be constructed based on the paper's full details]