new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Jun 14

Submitted by

akhaliq

Depth Anything V2

·
7 authors

Submitted by

akhaliq

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

·
6 authors

Submitted by

akhaliq

Transformers meet Neural Algorithmic Reasoners

·
8 authors

Submitted by

renll

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

·
6 authors

4

Submitted by

akhaliq

OpenVLA: An Open-Source Vision-Language-Action Model

·
18 authors

Submitted by

QHL067

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

·
6 authors

1

Submitted by

akhaliq

Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning

·
9 authors

Submitted by

akhaliq

DiTFastAttn: Attention Compression for Diffusion Transformer Models

·
9 authors

Submitted by

Fiaa

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

·
8 authors

1

Submitted by

akhaliq

Interpreting the Weight Space of Customized Diffusion Models

·
7 authors

Submitted by

Fiaa

MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

·
21 authors

2

Submitted by

akhaliq

HelpSteer2: Open-source dataset for training top-performing reward models

·
9 authors

3

Submitted by

matthieufp

mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus

·
8 authors

4

Submitted by

akhaliq

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

·
16 authors

4

Submitted by

roman-bachmann

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

·
9 authors

2

Submitted by

DrChiZhang

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

·
7 authors

3

Submitted by

Yiyuan

Explore the Limits of Omni-modal Pretraining at Scale

·
4 authors

3

Submitted by

alexiglad

Cognitively Inspired Energy-Based World Models

·
6 authors

5

Submitted by

akhaliq

Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs

·
3 authors

2

Submitted by

Fiaa

Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?

·
5 authors

1

Submitted by

weixifeng

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

·
6 authors

1

Submitted by

hwjiang

Real3D: Scaling Up Large Reconstruction Models with Real-World Images

·
3 authors

1

Submitted by

jlko

Estimating the Hallucination Rate of Generative AI

·
8 authors

1

Submitted by

zaydzuhri

MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding

·
4 authors

2

Submitted by

justinxzhao

Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus

·
3 authors

1

Submitted by

afaji

CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

·
75 authors

Submitted by

desaix

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

·
10 authors

1

Submitted by

sumukhaithal6

Understanding Hallucinations in Diffusion Models through Mode Interpolation

·
4 authors

1

Submitted by

lcysyzxdxc

CMC-Bench: Towards a New Paradigm of Visual Signal Compression

·
10 authors

Submitted by

YfZ

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

·
8 authors

2