6 10 14

Shubham Toshniwal

stoshniwal

https://shtoshni.github.io/

shtoshni

AI & ML interests

NLP, LLM

Recent Activity

new activity 9 days ago

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B:Tokenizer config is wrong

liked a model about 2 months ago

Qwen/Qwen2.5-Math-7B-Instruct

liked a model 2 months ago

Qwen/QwQ-32B-Preview

View all activity

Organizations

stoshniwal's activity

New activity in deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 9 days ago

Tokenizer config is wrong

#10 opened 9 days ago by

stoshniwal

liked a model about 2 months ago

Qwen/Qwen2.5-Math-7B-Instruct

Text Generation • Updated Sep 23, 2024 • 37.3k • 53

liked a model 2 months ago

Qwen/QwQ-32B-Preview

Text Generation • Updated 19 days ago • 189k • 1.6k

upvoted a paper 2 months ago

Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published Nov 26, 2024 • 49

updated 4 models 2 months ago

updated a dataset 2 months ago

nvidia/OpenMathInstruct-2

Viewer • Updated Nov 25, 2024 • 22M • 5.52k • 150

upvoted a collection 2 months ago

Qwen2.5-Math

Collection

Math-specific model series based on Qwen2.5 • 11 items • Updated 17 days ago • 66

liked a model 3 months ago

nvidia/Cosmos-0.1-Tokenizer-DV4x8x8

Updated Nov 11, 2024 • 405 • 12

upvoted an article 3 months ago

Article

Fixing Gradient Accumulation

Oct 16, 2024

• 49

upvoted a collection 4 months ago

Llama-3.1-Nemotron-70B

Collection

SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated 14 days ago • 152

New activity in nvidia/OpenMathInstruct-2 4 months ago

Upload scaling_plot.jpg

#4 opened 4 months ago by

shtoshni

Unable to load dataset

#3 opened 4 months ago by

minyichen

Dataset Viewer issue: JobManagerCrashedError

#2 opened 4 months ago by

stoshniwal

liked a model 4 months ago

nvidia/NVLM-D-72B

Image-Text-to-Text • Updated 16 days ago • 46.8k • 766

upvoted a collection 4 months ago

OpenMath-2

Collection

A collection of models and datasets introduced in "OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data" • 7 items • Updated 14 days ago • 13

upvoted 2 papers 4 months ago

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

Paper • 2410.01560 • Published Oct 2, 2024 • 4

Training Language Models on Synthetic Edit Sequences Improves Code Synthesis

Paper • 2410.02749 • Published Oct 3, 2024 • 12