Abstract
Virtual Width Networks (VWN) enhance model efficiency by expanding representational width without increasing computational cost, accelerating optimization and improving loss reduction.
We introduce Virtual Width Networks (VWN), a framework that delivers the benefits of wider representations without incurring the quadratic cost of increasing the hidden size. VWN decouples representational width from backbone width, expanding the embedding space while keeping backbone compute nearly constant. In our large-scale experiment, an 8-times expansion accelerates optimization by over 2 times for next-token and 3 times for next-2-token prediction. The advantage amplifies over training as both the loss gap grows and the convergence-speedup ratio increases, showing that VWN is not only token-efficient but also increasingly effective with scale. Moreover, we identify an approximately log-linear scaling relation between virtual width and loss reduction, offering an initial empirical basis and motivation for exploring virtual-width scaling as a new dimension of large-model efficiency.
Community
VWN decouples representational width from backbone width to expand embedding space with near-constant backbone compute, achieving faster convergence and a log-linear relation between virtual width and loss reduction.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Muon: Training and Trade-offs with Latent Attention and MoE (2025)
- Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space? (2025)
- Mixture-of-Channels: Exploiting Sparse FFNs for Efficient LLMs Pre-Training and Inference (2025)
- On the expressivity of sparse maxout networks (2025)
- CAT: Curvature-Adaptive Transformers for Geometry-Aware Learning (2025)
- On residual network depth (2025)
- SpecAttn: Speculating Sparse Attention (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
The ByteDance seed team constantly plagiarize our HyperZZW work. Below link lists the plagiarism evidence:
https://x.com/hyperzzw/status/1990520228238049362?s=46&t=BsqYoGA8vIHGcXwORlMk7w
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper