new

Get trending papers in your email inbox!

Subscribe

byAK and the research community

Mar 11

EMS: Adaptive Evict-then-Merge Strategy for Head-wise KV Cache Compression Based on Global-Local Importance

As large language models (LLMs) continue to advance, the demand for higher quality and faster processing of long contexts across various applications is growing. KV cache is widely adopted as it stores previously generated key and value tokens, effectively reducing redundant computations during inference. However, as memory overhead becomes a significant concern, efficient compression of KV cache has gained increasing attention. Most existing methods perform compression from two perspectives: identifying important tokens and designing compression strategies. However, these approaches often produce biased distributions of important tokens due to the influence of accumulated attention scores or positional encoding. Furthermore, they overlook the sparsity and redundancy across different heads, which leads to difficulties in preserving the most effective information at the head level. To this end, we propose EMS to overcome these limitations, while achieving better KV cache compression under extreme compression ratios. Specifically, we introduce a Global-Local score that combines accumulated attention scores from both global and local KV tokens to better identify the token importance. For the compression strategy, we design an adaptive and unified Evict-then-Merge framework that accounts for the sparsity and redundancy of KV tokens across different heads. Additionally, we implement the head-wise parallel compression through a zero-class mechanism to enhance efficiency. Extensive experiments demonstrate our SOTA performance even under extreme compression ratios. EMS consistently achieves the lowest perplexity, improves scores by over 1.28 points across four LLMs on LongBench under a 256 cache budget, and preserves 95% retrieval accuracy with a cache budget less than 2% of the context length in the Needle-in-a-Haystack task.

Overview of the SDSS-IV MaNGA Survey: Mapping Nearby Galaxies at Apache Point Observatory

We present an overview of a new integral field spectroscopic survey called MaNGA (Mapping Nearby Galaxies at Apache Point Observatory), one of three core programs in the fourth-generation Sloan Digital Sky Survey (SDSS-IV) that began on 2014 July 1. MaNGA will investigate the internal kinematic structure and composition of gas and stars in an unprecedented sample of 10,000 nearby galaxies. We summarize essential characteristics of the instrument and survey design in the context of MaNGA's key science goals and present prototype observations to demonstrate MaNGA's scientific potential. MaNGA employs dithered observations with 17 fiber-bundle integral field units that vary in diameter from 12" (19 fibers) to 32" (127 fibers). Two dual-channel spectrographs provide simultaneous wavelength coverage over 3600-10300 A at R~2000. With a typical integration time of 3 hr, MaNGA reaches a target r-band signal-to-noise ratio of 4-8 (per A, per 2" fiber) at 23 AB mag per sq. arcsec, which is typical for the outskirts of MaNGA galaxies. Targets are selected with stellar mass greater than 1e9 Msun using SDSS-I redshifts and i-band luminosity to achieve uniform radial coverage in terms of the effective radius, an approximately flat distribution in stellar mass, and a sample spanning a wide range of environments. Analysis of our prototype observations demonstrates MaNGA's ability to probe gas ionization, shed light on recent star formation and quenching, enable dynamical modeling, decompose constituent components, and map the composition of stellar populations. MaNGA's spatially resolved spectra will enable an unprecedented study of the astrophysics of nearby galaxies in the coming 6 yr.

Sloan Digital Sky Survey IV: Mapping the Milky Way, Nearby Galaxies, and the Distant Universe

We describe the Sloan Digital Sky Survey IV (SDSS-IV), a project encompassing three major spectroscopic programs. The Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2) is observing hundreds of thousands of Milky Way stars at high resolution and high signal-to-noise ratio in the near-infrared. The Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey is obtaining spatially-resolved spectroscopy for thousands of nearby galaxies (median redshift of z = 0.03). The extended Baryon Oscillation Spectroscopic Survey (eBOSS) is mapping the galaxy, quasar, and neutral gas distributions between redshifts z = 0.6 and 3.5 to constrain cosmology using baryon acoustic oscillations, redshift space distortions, and the shape of the power spectrum. Within eBOSS, we are conducting two major subprograms: the SPectroscopic IDentification of eROSITA Sources (SPIDERS), investigating X-ray AGN and galaxies in X-ray clusters, and the Time Domain Spectroscopic Survey (TDSS), obtaining spectra of variable sources. All programs use the 2.5-meter Sloan Foundation Telescope at Apache Point Observatory; observations there began in Summer 2014. APOGEE-2 also operates a second near-infrared spectrograph at the 2.5-meter du Pont Telescope at Las Campanas Observatory, with observations beginning in early 2017. Observations at both facilities are scheduled to continue through 2020. In keeping with previous SDSS policy, SDSS-IV provides regularly scheduled public data releases; the first one, Data Release 13, was made available in July 2016.