wang's picture

111 2

wang

wangxbx

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

MemOS: A Memory OS for AI System

upvoted a paper about 2 months ago

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

upvoted a paper about 2 months ago

PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models

View all activity

Organizations

None yet

upvoted 17 papers about 2 months ago

MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4 • 153

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1 • 233

PAROAttention: Pattern-Aware ReOrdering for Efficient Sparse and Quantized Attention in Visual Generation Models

Paper • 2506.16054 • Published Jun 19 • 60

MiMo-VL Technical Report

Paper • 2506.03569 • Published Jun 4 • 79

MiniCPM4: Ultra-Efficient LLMs on End Devices

Paper • 2506.07900 • Published Jun 9 • 90

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 263

A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency

Paper • 2505.01658 • Published May 3 • 39

SageAttention2++: A More Efficient Implementation of SageAttention2

Paper • 2505.21136 • Published May 27 • 47

Phi-4-reasoning Technical Report

Paper • 2504.21318 • Published Apr 30 • 51

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 69

Quartet: Native FP4 Training Can Be Optimal for Large Language Models

Paper • 2505.14669 • Published May 20 • 78

Scaling Law for Quantization-Aware Training

Paper • 2505.14302 • Published May 20 • 76

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Paper • 2505.11594 • Published May 16 • 76

Table-R1: Inference-Time Scaling for Table Reasoning

Paper • 2505.23621 • Published May 29 • 94

MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

Paper • 2505.07916 • Published May 12 • 132

Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11 • 150

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 283

upvoted a paper 4 months ago

Kuwain 1.5B: An Arabic SLM via Language Injection

Paper • 2504.15120 • Published Apr 21 • 122

upvoted 2 papers 5 months ago

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Paper • 2504.11651 • Published Apr 15 • 30

AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference

Paper • 2504.10326 • Published Apr 14 • 26