30 432 21

Fangyuan Yu PRO

Ksgk-fy

fangyuan-ksgk

AI & ML interests

AGI

Recent Activity

updated a collection 12 minutes ago

Representation & Optimization

upvoted a paper 12 minutes ago

Value Residual Learning For Alleviating Attention Concentration In Transformers

updated a collection about 8 hours ago

Representation & Optimization

View all activity

Organizations

Ksgk-fy's activity

upvoted a paper 12 minutes ago

Value Residual Learning For Alleviating Attention Concentration In Transformers

Paper • 2410.17897 • Published Oct 23, 2024 • 9

upvoted a paper about 23 hours ago

Flex Attention: A Programming Model for Generating Optimized Attention Kernels

Paper • 2412.05496 • Published Dec 7, 2024 • 1

upvoted 2 papers 1 day ago

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Paper • 2504.00906 • Published 1 day ago • 17

ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

Paper • 2504.00254 • Published 2 days ago • 1

upvoted a collection 1 day ago

Representation & Optimization

Collection

Understanding about representation sheds light on optimization • 7 items • Updated 12 minutes ago • 1

upvoted 2 papers 1 day ago

Approximate Nullspace Augmented Finetuning for Robust Vision Transformers

Paper • 2403.10476 • Published Mar 15, 2024 • 1

Token embeddings violate the manifold hypothesis

Paper • 2504.01002 • Published 1 day ago • 1

upvoted a paper 5 days ago

Nuclear Norm Regularization for Deep Learning

Paper • 2405.14544 • Published May 23, 2024 • 1

upvoted a paper 7 days ago

Layer by Layer: Uncovering Hidden Representations in Language Models

Paper • 2502.02013 • Published Feb 4 • 1

upvoted 3 papers 8 days ago

CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners

Paper • 2503.16356 • Published 14 days ago • 15

I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Paper • 2503.18878 • Published 10 days ago • 110

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published 28 days ago • 88

upvoted a paper 10 days ago

Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation

Paper • 2503.16430 • Published 14 days ago • 34

upvoted a paper 19 days ago

Denoising Hamiltonian Network for Physical Reasoning

Paper • 2503.07596 • Published 24 days ago • 1

upvoted a collection about 1 month ago

Image / Video Gen

Collection

Image Generation Using Diffusion-Based Methods: Tips and Techniques for Stable Diffusion • 36 items • Updated Mar 1 • 9

upvoted 5 papers about 1 month ago

Scaling LLM Pre-training with Vocabulary Curriculum

Paper • 2502.17910 • Published Feb 25 • 1

SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 71

Slamming: Training a Speech Language Model on One GPU in a Day

Paper • 2502.15814 • Published Feb 19 • 68

You Do Not Fully Utilize Transformer's Representation Capacity

Paper • 2502.09245 • Published Feb 13 • 34

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Paper • 2502.13063 • Published Feb 18 • 68