Papers
arxiv:2511.11238

Virtual Width Networks

Published on Nov 14
· Submitted by taesiri on Nov 17
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Virtual Width Networks (VWN) enhance model efficiency by expanding representational width without increasing computational cost, accelerating optimization and improving loss reduction.

AI-generated summary

We introduce Virtual Width Networks (VWN), a framework that delivers the benefits of wider representations without incurring the quadratic cost of increasing the hidden size. VWN decouples representational width from backbone width, expanding the embedding space while keeping backbone compute nearly constant. In our large-scale experiment, an 8-times expansion accelerates optimization by over 2 times for next-token and 3 times for next-2-token prediction. The advantage amplifies over training as both the loss gap grows and the convergence-speedup ratio increases, showing that VWN is not only token-efficient but also increasingly effective with scale. Moreover, we identify an approximately log-linear scaling relation between virtual width and loss reduction, offering an initial empirical basis and motivation for exploring virtual-width scaling as a new dimension of large-model efficiency.

Community

Paper submitter

VWN decouples representational width from backbone width to expand embedding space with near-constant backbone compute, achieving faster convergence and a log-linear relation between virtual width and loss reduction.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

The ByteDance seed team constantly plagiarize our HyperZZW work. Below link lists the plagiarism evidence:

https://x.com/hyperzzw/status/1990520228238049362?s=46&t=BsqYoGA8vIHGcXwORlMk7w

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.11238 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.11238 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.11238 in a Space README.md to link it from this page.

Collections including this paper 2