32 93 11

Byung-Kwan Lee

BK-Lee

https://sites.google.com/view/byungkwanlee

AI & ML interests

Computer Vision, Machine Learning, Large Language and Vision Models, Efficient Modeling

Recent Activity

new activity 10 days ago

nvidia/Eagle2-9B:Deepspeed ZeRO3 Compatible Issue

upvoted a paper 11 days ago

Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

liked a Space 12 days ago

opencompass/open_vlm_leaderboard

View all activity

Organizations

BK-Lee's activity

New activity in nvidia/Eagle2-9B 10 days ago

Deepspeed ZeRO3 Compatible Issue

#4 opened 10 days ago by

BK-Lee

upvoted a paper 11 days ago

Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Paper • 2501.13629 • Published 15 days ago • 42

liked a Space 12 days ago

595

Open VLM Leaderboard

🌎

VLMEvalKit Evaluation Results Collection

upvoted 4 papers 14 days ago

SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper • 2501.13200 • Published 16 days ago • 61

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Paper • 2501.12599 • Published 16 days ago • 86

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published 16 days ago • 79

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published 16 days ago • 302

upvoted a paper 16 days ago

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model

Paper • 2501.12368 • Published 17 days ago • 39

New activity in nvidia/Eagle2-9B 17 days ago

For training

#2 opened 17 days ago by

BK-Lee

liked a model 17 days ago

nvidia/Eagle2-9B

Image-Text-to-Text • Updated 10 days ago • 2.95k • 37

New activity in nvidia/Eagle2-9B 17 days ago

Version Crash for Qwen2 from Transformers

#1 opened 17 days ago by

BK-Lee

upvoted a collection 17 days ago

Eagle 2

Collection

Eagle 2 is a family of frontier vision-language models with vision-centric design. The model supports 4K HD input, long-context video, and grounding. • 9 items • Updated 15 days ago • 31

upvoted a paper 18 days ago

Evolving Deeper LLM Thinking

Paper • 2501.09891 • Published 21 days ago • 105

upvoted a paper 19 days ago

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Paper • 2501.08326 • Published 24 days ago • 31

upvoted a collection 20 days ago

Multimodal LLM

Collection

158 items • Updated about 8 hours ago • 10

upvoted 2 papers 20 days ago

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI

Paper • 2411.14522 • Published Nov 21, 2024 • 32

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Paper • 2501.09755 • Published 22 days ago • 33

upvoted 3 papers 23 days ago

Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding

Paper • 2501.07888 • Published 24 days ago • 15

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Paper • 2406.11271 • Published Jun 17, 2024 • 21

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published 24 days ago • 273