ChengsongHuang commited on
Commit
ad3e620
·
verified ·
1 Parent(s): 66027c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +176 -20
README.md CHANGED
@@ -1,46 +1,202 @@
1
  ---
2
  library_name: transformers
3
- license: mit
4
- language:
5
- - en
6
- base_model:
7
- - meta-llama/Llama-3.1-8B-Instruct
8
  pipeline_tag: text-generation
9
- datasets:
10
- - HINT-lab/Llama_3.1-8B-Instruct-Self-Calibration
 
 
 
11
  ---
12
 
13
- # Model Card for Model ID
14
 
15
- Model trained based on `meta-llama/Llama-3.1-8B-Instruct` by Self-Calibration proposed by [Efficient Test-Time Scaling via Self-Calibration](https://arxiv.org/abs/2503.00031).
16
 
17
 
18
- ## Model Sources
19
 
20
- <!-- Provide the basic links for the model. -->
21
 
22
- - **Repository:** https://github.com/Chengsong-Huang/Self-Calibration
23
- - **Paper :** Efficient Test-Time Scaling via Self-Calibration
24
 
 
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
- ## Citation
28
 
29
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
30
 
31
  **BibTeX:**
32
- ```
 
33
  @misc{huang2025efficienttesttimescalingselfcalibration,
34
- title={Efficient Test-Time Scaling via Self-Calibration},
35
  author={Chengsong Huang and Langlin Huang and Jixuan Leng and Jiacheng Liu and Jiaxin Huang},
36
  year={2025},
37
  eprint={2503.00031},
38
  archivePrefix={arXiv},
39
  primaryClass={cs.LG},
40
- url={https://arxiv.org/abs/2503.00031},
41
  }
42
  ```
43
 
44
- ## Model Card Contact
45
 
46
 
1
  ---
2
  library_name: transformers
 
 
 
 
 
3
  pipeline_tag: text-generation
4
+ license: apache-2.0
5
+ tags:
6
+ - self-calibration
7
+ - efficient-sampling
8
+ - llm-uncertainty
9
  ---
10
 
11
+ # Model Card for Efficient Test-Time Scaling via Self-Calibration
12
 
 
13
 
14
 
 
15
 
16
+ This model implements an efficient test-time scaling method using model confidence for dynamic sampling adjustment. It addresses the challenge of overconfidence in LLMs by introducing a self-calibration framework that generates calibrated confidence scores, improving computational efficiency without sacrificing accuracy. This is based on the research paper [Efficient Test-Time Scaling via Self-Calibration](https://arxiv.org/abs/2503.00031).
17
 
18
+ ## Model Details
 
19
 
20
+ ### Model Description
21
 
22
+ This model uses model confidence to dynamically adjust sampling during inference, leading to significant improvements in computational efficiency. The self-calibration framework ensures calibrated confidence scores, making the method robust and reliable. The model is designed to work with various sampling methods, including early exit, ascending confidence, self-consistency, and best-of-N.
23
+
24
+
25
+
26
+ - **Developed by:** HINT-lab
27
+ - **Model type:** Large Language Model (LLM)
28
+ - **Language(s) (NLP):** English (supports other languages depending on the base model used during training)
29
+ - **License:** Apache 2.0
30
+ - **Finetuned from model [optional]:** (Specify base model used, e.g., `meta-llama/Llama-3.1-8B-Instruct`)
31
+
32
+
33
+
34
+ ### Model Sources [optional]
35
+
36
+ - **Repository:** [https://github.com/HINT-lab/Self-Calibration](https://github.com/HINT-lab/Self-Calibration)
37
+ - **Paper [optional]:** [https://arxiv.org/abs/2503.00031](https://arxiv.org/abs/2503.00031)
38
+ - **Models:**
39
+ - [DeepSeek-R1-Distill-Qwen-1.5B-Self-Calibration](https://huggingface.co/HINT-lab/DeepSeek-R1-Distill-Qwen-1.5B-Self-Calibration)
40
+ - [Qwen2.5-7B-Instruct-Self-Calibration](https://huggingface.co/HINT-lab/Qwen2.5-7B-Instruct-Self-Calibration)
41
+ - [Llama-3.1-8B-Instruct-Self-Calibration](https://huggingface.co/HINT-lab/Llama-3.1-8B-Instruct-Self-Calibration)
42
+ - **Datasets:**
43
+ - [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/datasets/HINT-lab/DeepSeek-R1-Distill-Qwen-1.5B-Self-Calibration)
44
+ - [Qwen2.5-7B-Instruct](https://huggingface.co/datasets/HINT-lab/Qwen2.5-7B-Instruct-Self-Calibration)
45
+ - [Llama_3.1-8B-Instruct](https://huggingface.co/datasets/HINT-lab/Llama_3.1-8B-Instruct-Self-Calibration)
46
+
47
+ ## Uses
48
+
49
+
50
+
51
+ ### Direct Use
52
+
53
+ The model can be used directly for text generation tasks with various sampling methods. The user can specify the desired sampling method, confidence threshold (if applicable), number of samples, and temperature.
54
+
55
+
56
+
57
+ ### Downstream Use [optional]
58
+
59
+ The model can be fine-tuned for specific downstream tasks or integrated into larger applications requiring efficient text generation.
60
+
61
+
62
+
63
+ ### Out-of-Scope Use
64
+
65
+ The model may not perform well on tasks requiring high creativity or those outside the domains represented in the training data. The accuracy of the confidence scores depends heavily on the quality and calibration of the underlying base LLM.
66
+
67
+
68
+
69
+ ## Bias, Risks, and Limitations
70
+
71
+ The model inherits biases from its base LLM. The accuracy of the confidence scores and the effectiveness of the sampling methods may vary depending on the task and the base model. Over-reliance on the model's confidence scores without considering other factors could lead to incorrect inferences.
72
+
73
+
74
+
75
+ ### Recommendations
76
+
77
+ Users should be aware of potential biases and limitations. It's recommended to evaluate the model's performance on specific tasks before deploying it to critical applications. Users should also critically evaluate the confidence scores provided by the model.
78
+
79
+
80
+
81
+ ## How to Get Started with the Model
82
+
83
+ See the "Quickstart" section in the Github README for instructions on how to install the necessary packages and use the model for inference.
84
+
85
+
86
+
87
+ ## Training Details
88
+
89
+ ### Training Data
90
+
91
+ The training data consists of datasets created by generating multiple responses to prompts from various benchmark datasets (more detail can be found in the Github README).
92
+
93
+
94
+
95
+ ### Training Procedure
96
+
97
+ The training procedure involves a self-calibration process to improve the model's ability to generate calibrated confidence scores. Details are in the Github README.
98
+
99
+
100
+
101
+
102
+
103
+
104
+ #### Training Hyperparameters
105
+
106
+ (To be added from Github README - `model_training/configs/{version}.json`)
107
+
108
+ #### Speeds, Sizes, Times [optional]
109
+
110
+ (To be added from Github README - training times on various hardware)
111
+
112
+
113
+
114
+ ## Evaluation
115
+
116
+ (To be added from Github README - evaluation protocols and results)
117
+
118
+ ### Testing Data, Factors & Metrics
119
+
120
+ (To be added from Github README)
121
+
122
+
123
+
124
+
125
+
126
+
127
+
128
+
129
+
130
+
131
+
132
+
133
+
134
+
135
+
136
+
137
+
138
+ ### Results
139
+
140
+ (To be added from Github README)
141
+
142
+ #### Summary
143
+
144
+ (To be added from Github README)
145
+
146
+
147
+
148
+
149
+
150
+
151
+
152
+ ## Environmental Impact
153
+
154
+ (To be added based on hardware usage reported in the Github README)
155
+
156
+
157
+
158
+
159
+
160
+
161
+
162
+
163
+
164
+ ## Technical Specifications [optional]
165
+
166
+ (To be added based on model architecture and training details in the Github README)
167
+
168
+
169
+
170
+
171
+
172
+
173
+
174
+
175
+
176
+
177
+
178
+
179
+
180
+
181
+
182
+ ## Citation [optional]
183
 
 
184
 
 
185
 
186
  **BibTeX:**
187
+
188
+ ```bibtex
189
  @misc{huang2025efficienttesttimescalingselfcalibration,
190
+ title={Efficient Test-Time Scaling via Self-Calibration},
191
  author={Chengsong Huang and Langlin Huang and Jixuan Leng and Jiacheng Liu and Jiaxin Huang},
192
  year={2025},
193
  eprint={2503.00031},
194
  archivePrefix={arXiv},
195
  primaryClass={cs.LG},
196
+ url={https://arxiv.org/abs/2503.00031},
197
  }
198
  ```
199
 
200
+ **APA:**
201
 
202
+ (To be added based on the citation information in the Github README)