|
--- |
|
base_model: |
|
- distilbert/distilbert-base-uncased |
|
datasets: |
|
- openai/gsm8k |
|
- ChilleD/SVAMP |
|
- deepmind/aqua_rat |
|
- ucinlp/drop |
|
- allenai/openbookqa |
|
- ChilleD/StrategyQA |
|
- lucasmccabe/logiqa |
|
- metaeval/reclor |
|
- hotpotqa/hotpot_qa |
|
- dgslibisey/MuSiQue |
|
- allenai/qasc |
|
- nguyen-brat/worldtree |
|
- qiaojin/PubMedQA |
|
language: |
|
- en |
|
library_name: transformers |
|
license: mit |
|
tags: |
|
- text-classification |
|
- sketch-of-thought |
|
- efficient-inference |
|
--- |
|
|
|
# SoT_DistilBERT: Paradigm Selection Model for Sketch-of-Thought |
|
|
|
[](LICENSE) |
|
[](https://www.python.org/downloads/) |
|
[](https://pytorch.org/) |
|
[](https://github.com/SimonAytes/SoT) |
|
|
|
## What is Sketch-of-Thought? |
|
|
|
Sketch-of-Thought (SoT) is a novel prompting framework for efficient reasoning in language models that combines cognitive-inspired reasoning paradigms with linguistic constraints to minimize output token usage while preserving reasoning accuracy. |
|
|
|
Unlike conventional Chain of Thought (CoT) approaches that produce verbose reasoning chains, SoT implements three distinct reasoning paradigms: |
|
|
|
- **Conceptual Chaining**: Connects essential ideas in logical sequences through structured step links. Effective for commonsense reasoning, multi-hop inference, and fact-based recall tasks. |
|
|
|
- **Chunked Symbolism**: Organizes numerical and symbolic reasoning into structured steps with equations, variables, and arithmetic operations. Excels in mathematical problems and technical calculations. |
|
|
|
- **Expert Lexicons**: Leverages domain-specific shorthand, technical symbols, and jargon for precise and efficient communication. Suited for technical disciplines requiring maximum information density. |
|
|
|
|
|
## Loading the Model |
|
|
|
This repository contains the DistilBERT paradigm selection model for the Sketch-of-Thought (SoT) framework. You can load and use it directly with Hugging Face Transformers: |
|
|
|
```python |
|
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification |
|
import torch |
|
import json |
|
|
|
# Load the model directly from Hugging Face |
|
model = DistilBertForSequenceClassification.from_pretrained("saytes/SoT_DistilBERT") |
|
tokenizer = DistilBertTokenizer.from_pretrained("saytes/SoT_DistilBERT") |
|
|
|
# Define label mapping |
|
label_mapping = { |
|
"chunked_symbolism": 0, |
|
"conceptual_chaining": 1, |
|
"expert_lexicons": 2 |
|
} |
|
|
|
# Function to classify questions |
|
def classify_question(question): |
|
inputs = tokenizer(question, return_tensors="pt", truncation=True, padding=True) |
|
outputs = model(**inputs) |
|
predicted_class = torch.argmax(outputs.logits, dim=1).item() |
|
|
|
# Reverse mapping to get the paradigm name |
|
label_mapping_reverse = {v: k for k, v in label_mapping.items()} |
|
return label_mapping_reverse[predicted_class] |
|
|
|
# Example usage |
|
question = "Alice has 5 apples. She gives 3 apples to Bob. How many apples does Alice have?" |
|
paradigm = classify_question(question) |
|
print(f"Recommended paradigm: {paradigm}") # Output: "chunked_symbolism" |
|
``` |
|
|
|
For easier integration, we also provide a complete Python package implementation. See the [GitHub repository](https://github.com/SimonAytes/SoT) or the "Complete Package" section below for details. |
|
|
|
## Model Description |
|
|
|
The SoT_DistilBERT model is a fine-tuned DistilBERT classifier trained to select the optimal reasoning paradigm for a given query based on the Sketch-of-Thought framework. |
|
|
|
### Training Data |
|
The model was trained on approximately 14,200 samples across various reasoning tasks, with each sample labeled using one of the three SoT paradigms. Labels were assigned using GPT-4o with a classification-specific prompt based on predefined heuristics. |
|
|
|
### Model Architecture |
|
- **Base model**: DistilBERT |
|
- **Training**: 5 epochs, batch size 64, learning rate 2e-5 |
|
- **Loss**: Cross-entropy |
|
|
|
## Complete Package |
|
|
|
For a more streamlined experience, we've developed the SoT Python package that handles paradigm selection, prompt management, and exemplar formatting: |
|
|
|
```python |
|
from sketch_of_thought import SoT |
|
|
|
# Initialize SoT |
|
sot = SoT() |
|
|
|
# Classify a question and get appropriate paradigm |
|
question = "Alice has 5 apples. She gives 3 apples to Bob. How many apples does Alice have?" |
|
paradigm = sot.classify_question(question) # Returns: 'chunked_symbolism' |
|
|
|
# Get initialized context with exemplars for the selected paradigm |
|
context = sot.get_initialized_context( |
|
paradigm=paradigm, |
|
question=question, |
|
format="llm", |
|
include_system_prompt=True |
|
) |
|
|
|
# Use with your LLM of choice |
|
``` |
|
|
|
## Example with Qwen2.5-7B |
|
|
|
Here's a complete example using Qwen2.5-7B-Instruct: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from sketch_of_thought import SoT |
|
|
|
# Initialize SoT |
|
sot = SoT() |
|
|
|
# Load Qwen model |
|
model_name = "Qwen/Qwen2.5-7B-Instruct" |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype="auto", |
|
device_map="auto" |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
# Prepare the question |
|
prompt = "Alice has 5 apples. She gives 3 apples to Bob. How many apples does Alice have?" |
|
|
|
# Classify and get appropriate context |
|
paradigm = sot.classify_question(prompt) |
|
messages = sot.get_initialized_context( |
|
paradigm, |
|
prompt, |
|
format="llm", |
|
include_system_prompt=True |
|
) |
|
|
|
# Format for the model |
|
text = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize=False, |
|
add_generation_prompt=True |
|
) |
|
model_inputs = tokenizer([text], return_tensors="pt").to(model.device) |
|
|
|
# Generate response |
|
generated_ids = model.generate( |
|
**model_inputs, |
|
max_new_tokens=512 |
|
) |
|
generated_ids = [ |
|
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) |
|
] |
|
|
|
# Decode response |
|
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
|
print(response) |
|
``` |
|
|
|
**Output:** |
|
|
|
``` |
|
<think> |
|
A = 5 |
|
A -= 3 |
|
A = 2 |
|
</think> |
|
|
|
\boxed{2} |
|
``` |
|
|
|
## Supported Formats |
|
|
|
The SoT package supports multiple output formats: |
|
|
|
- `"llm"`: Standard chat format for text-only LLMs |
|
- `"vlm"`: Multimodal format for vision-language models |
|
- `"raw"`: Raw exemplars without formatting |
|
|
|
|
|
|
|
<details> |
|
<summary>What's the difference?</summary> |
|
|
|
### LLM Format |
|
|
|
Standard `messages` format for Large Language Models. |
|
|
|
```python |
|
[ |
|
{ |
|
"role": "system", |
|
"content": "SYSTEM_PROMPT_HERE" |
|
}, |
|
{ |
|
"role": "user", |
|
"content": "EXAMPLE_QUESTION_HERE" |
|
}, |
|
{ |
|
"role": "assistant", |
|
"content": "EXAMPLE_ANSWER_HERE" |
|
}, |
|
{ |
|
"role": "user", |
|
"content": "USER_QUESTION_HERE" |
|
} |
|
] |
|
``` |
|
|
|
### VLM Format |
|
|
|
Standard `messages` format for Large Vision-Language Models. |
|
|
|
```python |
|
[ |
|
{ |
|
"role": "system", |
|
"content": "SYSTEM_PROMPT_HERE" |
|
}, |
|
{ |
|
"role": "user", |
|
"content": [{"type": "text", "text": "EXAMPLE_QUESTION_HERE"}] |
|
}, |
|
{ |
|
"role": "assistant", |
|
"content": [{"type": "text", "text": "EXAMPLE_ANSWER_HERE"}] |
|
}, |
|
{ |
|
"role": "user", |
|
"content": [{"type": "text", "text": "USER_QUESTION_HERE"}] |
|
} |
|
] |
|
``` |
|
|
|
### Raw Format |
|
|
|
Raw exemplar data. Apply your own format! |
|
|
|
```python |
|
[ |
|
{ |
|
"question": "EXAMPLE_QUESTION_HERE", |
|
"answer": "EXAMPLE_ANSWER_HERE" |
|
}, |
|
{ |
|
"question": "EXAMPLE_QUESTION_HERE", |
|
"answer": "EXAMPLE_ANSWER_HERE" |
|
} |
|
] |
|
``` |
|
</details> |
|
|
|
## Multilingual Support |
|
|
|
SoT supports multiple languages. System prompts and exemplars are automatically loaded in the requested language. |
|
|
|
## Paradigm Selection Model |
|
|
|
SoT includes a pretrained DistilBERT model for automatic paradigm selection based on the question. The model is available on Hugging Face: [saytes/SoT_DistilBERT](https://huggingface.co/saytes/SoT_DistilBERT) |
|
|
|
## Datasets |
|
|
|
The SoT_DistilBERT model was evaluated on the following datasets: |
|
|
|
| Dataset | HF ID | Subset | Split | Evaluation Type | |
|
|---------|-------|--------|-------|----------------| |
|
| GSM8K | [gsm8k](https://huggingface.co/datasets/gsm8k) | main | test | numerical | |
|
| SVAMP | [ChilleD/SVAMP](https://huggingface.co/datasets/ChilleD/SVAMP) | - | test | numerical | |
|
| AQUA-RAT | [aqua_rat](https://huggingface.co/datasets/aqua_rat) | - | test | multiple_choice | |
|
| DROP | [drop](https://huggingface.co/datasets/drop) | - | validation | open | |
|
| OpenbookQA | [openbookqa](https://huggingface.co/datasets/openbookqa) | - | test | multiple_choice | |
|
| StrategyQA | [ChilleD/StrategyQA](https://huggingface.co/datasets/ChilleD/StrategyQA) | - | test | yesno | |
|
| LogiQA | [lucasmccabe/logiqa](https://huggingface.co/datasets/lucasmccabe/logiqa) | default | test | multiple_choice | |
|
| Reclor | [metaeval/reclor](https://huggingface.co/datasets/metaeval/reclor) | - | validation | multiple_choice | |
|
| HotPotQA | [hotpot_qa](https://huggingface.co/datasets/hotpot_qa) | distractor | validation | open | |
|
| MuSiQue-Ans | [dgslibisey/MuSiQue](https://huggingface.co/datasets/dgslibisey/MuSiQue) | - | validation | open | |
|
| QASC | [allenai/qasc](https://huggingface.co/datasets/allenai/qasc) | - | validation | multiple_choice | |
|
| Worldtree | [nguyen-brat/worldtree](https://huggingface.co/datasets/nguyen-brat/worldtree) | - | train | multiple_choice | |
|
| PubMedQA | [qiaojin/PubMedQA](https://huggingface.co/datasets/qiaojin/PubMedQA) | pqa_labeled | train | yesno | |
|
| MedQA | [bigbio/med_qa](https://huggingface.co/datasets/bigbio/med_qa) | med_qa_en_source | validation | multiple_choice | |
|
|
|
## Limitations |
|
|
|
- The model is trained to classify questions into one of three predefined paradigms and may not generalize to tasks outside the training distribution. |
|
- Performance may vary depending on the complexity and domain of the question. |
|
|
|
## Citation |
|
|
|
If you find our work helpful, please cite: |
|
|
|
``` |
|
@misc{aytes2025sot, |
|
title={Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching}, |
|
author={Simon A. Aytes and Jinheon Baek and Sung Ju Hwang}, |
|
year={2025}, |
|
eprint={2503.05179}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://hf.co/papers/2503.05179}, |
|
} |
|
``` |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |