saytes
/

SoT_DistilBERT

 - en
 base_model:
 - distilbert/distilbert-base-uncased
+---
+# SoT_DistilBERT: Paradigm Selection Model for Sketch-of-Thought
+[![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
+[![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://www.python.org/downloads/)
+[![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-orange.svg)](https://pytorch.org/)
+[![GitHub](https://img.shields.io/badge/GitHub-Repository-green)](https://github.com/yourusername/sketch-of-thought)
+## Loading the Model
+This repository contains the DistilBERT paradigm selection model for the Sketch-of-Thought (SoT) framework. You can load and use it directly with Hugging Face Transformers:
+```python
+from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
+import torch
+import json
+# Load the model directly from Hugging Face
+model = DistilBertForSequenceClassification.from_pretrained("saytes/SoT_DistilBERT")
+tokenizer = DistilBertTokenizer.from_pretrained("saytes/SoT_DistilBERT")
+# Define label mapping
+label_mapping = {
+   "chunked_symbolism": 0,
+   "conceptual_chaining": 1,
+   "expert_lexicons": 2
+}
+# Function to classify questions
+def classify_question(question):
+    inputs = tokenizer(question, return_tensors="pt", truncation=True, padding=True)
+    outputs = model(**inputs)
+    predicted_class = torch.argmax(outputs.logits, dim=1).item()
+    # Reverse mapping to get the paradigm name
+    label_mapping_reverse = {v: k for k, v in label_mapping.items()}
+    return label_mapping_reverse[predicted_class]
+# Example usage
+question = "Alice has 5 apples. She gives 3 apples to Bob. How many apples does Alice have?"
+paradigm = classify_question(question)
+print(f"Recommended paradigm: {paradigm}")  # Output: "chunked_symbolism"
+```
+For easier integration, we also provide a complete Python package implementation. See the [GitHub repository](https://github.com/yourusername/sketch-of-thought) or the "Complete Package" section below for details.
+## Model Description
+The SoT_DistilBERT model is a fine-tuned DistilBERT classifier trained to select the optimal reasoning paradigm for a given query based on the Sketch-of-Thought framework.
+### Training Data
+The model was trained on approximately 14,200 samples across various reasoning tasks, with each sample labeled using one of the three SoT paradigms. Labels were assigned using GPT-4o with a classification-specific prompt based on predefined heuristics.
+### Model Architecture
+- **Base model**: DistilBERT
+- **Training**: 5 epochs, batch size 64, learning rate 2e-5
+- **Loss**: Cross-entropy
+## What is Sketch-of-Thought?
+Sketch-of-Thought (SoT) is a novel prompting framework for efficient reasoning in language models that combines cognitive-inspired reasoning paradigms with linguistic constraints to minimize output token usage while preserving reasoning accuracy.
+Unlike conventional Chain of Thought (CoT) approaches that produce verbose reasoning chains, SoT implements three distinct reasoning paradigms:
+- **Conceptual Chaining**: Connects essential ideas in logical sequences through structured step links. Effective for commonsense reasoning, multi-hop inference, and fact-based recall tasks.
+- **Chunked Symbolism**: Organizes numerical and symbolic reasoning into structured steps with equations, variables, and arithmetic operations. Excels in mathematical problems and technical calculations.
+- **Expert Lexicons**: Leverages domain-specific shorthand, technical symbols, and jargon for precise and efficient communication. Suited for technical disciplines requiring maximum information density.
+## Complete Package
+For a more streamlined experience, we've developed the SoT Python package that handles paradigm selection, prompt management, and exemplar formatting:
+```python
+from sketch_of_thought import SoT
+# Initialize SoT
+sot = SoT()
+# Classify a question and get appropriate paradigm
+question = "Alice has 5 apples. She gives 3 apples to Bob. How many apples does Alice have?"
+paradigm = sot.classify_question(question)  # Returns: 'chunked_symbolism'
+# Get initialized context with exemplars for the selected paradigm
+context = sot.get_initialized_context(
+    paradigm=paradigm,
+    question=question,
+    format="llm",
+    include_system_prompt=True
+)
+# Use with your LLM of choice
+```
+## Example with Qwen2.5-7B
+Here's a complete example using Qwen2.5-7B-Instruct:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from sketch_of_thought import SoT
+# Initialize SoT
+sot = SoT()
+# Load Qwen model
+model_name = "Qwen/Qwen2.5-7B-Instruct"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Prepare the question
+prompt = "Alice has 5 apples. She gives 3 apples to Bob. How many apples does Alice have?"
+# Classify and get appropriate context
+paradigm = sot.classify_question(prompt)
+messages = sot.get_initialized_context(
+    paradigm,
+    prompt,
+    format="llm",
+    include_system_prompt=True
+)
+# Format for the model
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+# Generate response
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=512
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+# Decode response
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+```
+**Output:**
+```
+<think>
+A = 5
+A -= 3
+A = 2
+</think>
+\boxed{2}
+```
+## Supported Formats
+The SoT package supports multiple output formats:
+- `"llm"`: Standard chat format for text-only LLMs
+- `"vlm"`: Multimodal format for vision-language models
+- `"raw"`: Raw exemplars without formatting
+<details>
+  <summary>What's the difference?</summary>
+  ### LLM Format
+  Standard `messages` format for Large Language Models.
+  ```python
+  [
+    {
+      "role": "system",
+      "content": "SYSTEM_PROMPT_HERE"
+    },
+    {
+      "role": "user",
+      "content": "EXAMPLE_QUESTION_HERE"
+    },
+    {
+      "role": "assistant",
+      "content": "EXAMPLE_ANSWER_HERE"
+    },
+    {
+      "role": "user",
+      "content": "USER_QUESTION_HERE"
+    }
+  ]
+  ```
+  ### VLM Format
+  Standard `messages` format for Large Vision-Language Models.
+  ```python
+  [
+    {
+      "role": "system",
+      "content": "SYSTEM_PROMPT_HERE"
+    },
+    {
+      "role": "user",
+      "content": [{"type": "text", "text": "EXAMPLE_QUESTION_HERE"}]
+    },
+    {
+      "role": "assistant",
+      "content": [{"type": "text", "text": "EXAMPLE_ANSWER_HERE"}]
+    },
+    {
+      "role": "user",
+      "content": [{"type": "text", "text": "USER_QUESTION_HERE"}]
+    }
+  ]
+  ```
+  ### Raw Format
+  Raw exemplar data. Apply your own format!
+  ```python
+  [
+    {
+      "question": "EXAMPLE_QUESTION_HERE",
+      "answer": "EXAMPLE_ANSWER_HERE"
+    },
+    {
+      "question": "EXAMPLE_QUESTION_HERE",
+      "answer": "EXAMPLE_ANSWER_HERE"
+    }
+  ]
+  ```
+</details>
+## Multilingual Support
+SoT supports multiple languages. System prompts and exemplars are automatically loaded in the requested language.
+## Limitations
+- The model is trained to classify questions into one of three predefined paradigms and may not generalize to tasks outside the training distribution.
+- Performance may vary depending on the complexity and domain of the question.
+## Citation
+If you find our work helpful, please cite:
+```
+@article{sot2025,
+  title={TITLE-HERE},
+  author={NAMES-HERE},
+  journal={arXiv preprint},
+  year={2025}
+}
+```
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.