14 27 187

Ken Tsui

kenhktsui

shtefcs's profile picture

alvanlii's profile picture

AliSerwat's profile picture

https://kenhktsui.github.io/

kenhktsui
kenhktsui

AI & ML interests

ML engineer, researcher VLM, LLM benchmark Opinions are my own

Recent Activity

new activity about 2 months ago

kenhktsui/scli5:Improve dataset card: Add task category, tags, language, detailed description, and sample usage

new activity about 2 months ago

kenhktsui/prm800k_sc:Improve dataset card: Add task category, tags, and expanded description

new activity about 2 months ago

kenhktsui/gsm8k_sc:Improve dataset card: Add task category, language, tags, and expand description

View all activity

Organizations

kenhktsui 's collections 7

Self Correction Bench

Benchmarking LLM capability of external and internal error correction

kenhktsui/scli5

Viewer • Updated Jul 6 • 286 • 103
kenhktsui/gsm8k_sc

Viewer • Updated Jul 6 • 1.31k • 102
kenhktsui/prm800k_sc

Viewer • Updated Jul 6 • 448 • 86
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published Jul 3 • 9

LongTalk

A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 107 • 13
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 15 • 1
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged

Text Generation • 8B • Updated Dec 30, 2024 • 5
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 21 • 1

CoT

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 107 • 13
open-thoughts/OpenThoughts-114k

Viewer • Updated 2 days ago • 228k • 30.5k • 750
ServiceNow-AI/R1-Distill-SFT

Viewer • Updated Feb 8 • 1.85M • 7.93k • 305
PowerInfer/QWQ-LONGCOT-500K

Viewer • Updated Dec 26, 2024 • 286k • 843 • 126

VLM Data

HuggingFaceM4/the_cauldron

Viewer • Updated May 6, 2024 • 1.88M • 102k • 491
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24 • 3.94M • 16.5k • 216
HuggingFaceM4/Docmatix

Viewer • Updated Aug 26, 2024 • 2.55M • 9.63k • 288
zwq2018/embodied_reasoner

Preview • Updated Apr 21 • 635 • 16

FastText Model for Pretraining Data Curation

kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26 • 520 • 28
kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3 • 22 • 4
kenhktsui/code-natural-language-fasttext-classifier

Text Classification • Updated Jul 3 • 1.1k • 1
kenhktsui/math-fasttext-classifier

Text Classification • Updated Jul 3 • 26 • 1

textbook-quality-classifier

kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3 • 22 • 4
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26 • 520 • 28
kenhktsui/llm-data-textbook-quality-classifier-v1

Text Classification • 0.3B • Updated May 25, 2024 • 6 • 9
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1

Text Classification • Updated May 25, 2024 • 5 • 4

nano-phi

Small Language Model Trained with Textbook Quality Data - How Far Can It Go?

kenhktsui/nano-phi-115M-v0.1

Text Generation • 0.1B • Updated Apr 6, 2024 • 5 • 4
kenhktsui/nano-phi-115M-control-v0.1

Text Generation • 0.1B • Updated Feb 4, 2024 • 7 • 1
kenhktsui/nano-phi-192M-v0.1

Text Generation • 0.2B • Updated May 8, 2024 • 3 • 1

Self Correction Bench

Benchmarking LLM capability of external and internal error correction

kenhktsui/scli5

Viewer • Updated Jul 6 • 286 • 103
kenhktsui/gsm8k_sc

Viewer • Updated Jul 6 • 1.31k • 102
kenhktsui/prm800k_sc

Viewer • Updated Jul 6 • 448 • 86
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published Jul 3 • 9

FastText Model for Pretraining Data Curation

kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26 • 520 • 28
kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3 • 22 • 4
kenhktsui/code-natural-language-fasttext-classifier

Text Classification • Updated Jul 3 • 1.1k • 1
kenhktsui/math-fasttext-classifier

Text Classification • Updated Jul 3 • 26 • 1

LongTalk

A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 107 • 13
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 15 • 1
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged

Text Generation • 8B • Updated Dec 30, 2024 • 5
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 21 • 1

textbook-quality-classifier

kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3 • 22 • 4
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26 • 520 • 28
kenhktsui/llm-data-textbook-quality-classifier-v1

Text Classification • 0.3B • Updated May 25, 2024 • 6 • 9
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1

Text Classification • Updated May 25, 2024 • 5 • 4

CoT

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 107 • 13
open-thoughts/OpenThoughts-114k

Viewer • Updated 2 days ago • 228k • 30.5k • 750
ServiceNow-AI/R1-Distill-SFT

Viewer • Updated Feb 8 • 1.85M • 7.93k • 305
PowerInfer/QWQ-LONGCOT-500K

Viewer • Updated Dec 26, 2024 • 286k • 843 • 126

nano-phi

Small Language Model Trained with Textbook Quality Data - How Far Can It Go?

kenhktsui/nano-phi-115M-v0.1

Text Generation • 0.1B • Updated Apr 6, 2024 • 5 • 4
kenhktsui/nano-phi-115M-control-v0.1

Text Generation • 0.1B • Updated Feb 4, 2024 • 7 • 1
kenhktsui/nano-phi-192M-v0.1

Text Generation • 0.2B • Updated May 8, 2024 • 3 • 1

VLM Data

HuggingFaceM4/the_cauldron

Viewer • Updated May 6, 2024 • 1.88M • 102k • 491
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24 • 3.94M • 16.5k • 216
HuggingFaceM4/Docmatix

Viewer • Updated Aug 26, 2024 • 2.55M • 9.63k • 288
zwq2018/embodied_reasoner

Preview • Updated Apr 21 • 635 • 16