Benchmarking LLM capability of external and internal error correction
Ken Tsui
kenhktsui
AI & ML interests
ML engineer, researcher
VLM, LLM benchmark
Opinions are my own
Recent Activity
new activity
about 2 months ago
kenhktsui/scli5:Improve dataset card: Add task category, tags, language, detailed description, and sample usage
new activity
about 2 months ago
kenhktsui/prm800k_sc:Improve dataset card: Add task category, tags, and expanded description
new activity
about 2 months ago
kenhktsui/gsm8k_sc:Improve dataset card: Add task category, language, tags, and expand description
Organizations
LongTalk
A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training
-
kenhktsui/longtalk-cot-v0.1
Viewer • Updated • 61.2k • 107 • 13 -
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf
8B • Updated • 15 • 1 -
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged
Text Generation • 8B • Updated • 5 -
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf
8B • Updated • 21 • 1
CoT
VLM Data
FastText Model for Pretraining Data Curation
-
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification • Updated • 520 • 28 -
kenhktsui/fineweb-edu-fasttext-classifier
Text Classification • Updated • 22 • 4 -
kenhktsui/code-natural-language-fasttext-classifier
Text Classification • Updated • 1.1k • 1 -
kenhktsui/math-fasttext-classifier
Text Classification • Updated • 26 • 1
textbook-quality-classifier
-
kenhktsui/fineweb-edu-fasttext-classifier
Text Classification • Updated • 22 • 4 -
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification • Updated • 520 • 28 -
kenhktsui/llm-data-textbook-quality-classifier-v1
Text Classification • 0.3B • Updated • 6 • 9 -
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1
Text Classification • Updated • 5 • 4
nano-phi
Small Language Model Trained with Textbook Quality Data - How Far Can It Go?
Self Correction Bench
Benchmarking LLM capability of external and internal error correction
FastText Model for Pretraining Data Curation
-
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification • Updated • 520 • 28 -
kenhktsui/fineweb-edu-fasttext-classifier
Text Classification • Updated • 22 • 4 -
kenhktsui/code-natural-language-fasttext-classifier
Text Classification • Updated • 1.1k • 1 -
kenhktsui/math-fasttext-classifier
Text Classification • Updated • 26 • 1
LongTalk
A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training
-
kenhktsui/longtalk-cot-v0.1
Viewer • Updated • 61.2k • 107 • 13 -
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf
8B • Updated • 15 • 1 -
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged
Text Generation • 8B • Updated • 5 -
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf
8B • Updated • 21 • 1
textbook-quality-classifier
-
kenhktsui/fineweb-edu-fasttext-classifier
Text Classification • Updated • 22 • 4 -
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2
Text Classification • Updated • 520 • 28 -
kenhktsui/llm-data-textbook-quality-classifier-v1
Text Classification • 0.3B • Updated • 6 • 9 -
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1
Text Classification • Updated • 5 • 4
CoT
nano-phi
Small Language Model Trained with Textbook Quality Data - How Far Can It Go?
VLM Data