Collections
Discover the best community collections!
Collections including paper arxiv:2407.01449
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper ā¢ 2405.07863 ā¢ Published ā¢ 67 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper ā¢ 2405.09818 ā¢ Published ā¢ 130 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper ā¢ 2405.15574 ā¢ Published ā¢ 53 -
An Introduction to Vision-Language Modeling
Paper ā¢ 2405.17247 ā¢ Published ā¢ 87
-
Mixture-of-Agents Enhances Large Language Model Capabilities
Paper ā¢ 2406.04692 ā¢ Published ā¢ 56 -
CRAG -- Comprehensive RAG Benchmark
Paper ā¢ 2406.04744 ā¢ Published ā¢ 45 -
Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach
Paper ā¢ 2406.04594 ā¢ Published ā¢ 6 -
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper ā¢ 2406.04271 ā¢ Published ā¢ 29
-
Iterative Reasoning Preference Optimization
Paper ā¢ 2404.19733 ā¢ Published ā¢ 48 -
Better & Faster Large Language Models via Multi-token Prediction
Paper ā¢ 2404.19737 ā¢ Published ā¢ 76 -
ORPO: Monolithic Preference Optimization without Reference Model
Paper ā¢ 2403.07691 ā¢ Published ā¢ 64 -
KAN: Kolmogorov-Arnold Networks
Paper ā¢ 2404.19756 ā¢ Published ā¢ 109
-
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Paper ā¢ 2401.13313 ā¢ Published ā¢ 5 -
BAAI/Bunny-v1_0-4B
Text Generation ā¢ Updated ā¢ 146 ā¢ 9 -
What matters when building vision-language models?
Paper ā¢ 2405.02246 ā¢ Published ā¢ 102 -
Jina CLIP: Your CLIP Model Is Also Your Text Retriever
Paper ā¢ 2405.20204 ā¢ Published ā¢ 35
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper ā¢ 2402.04252 ā¢ Published ā¢ 26 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper ā¢ 2402.03749 ā¢ Published ā¢ 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper ā¢ 2402.04615 ā¢ Published ā¢ 41 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper ā¢ 2402.05008 ā¢ Published ā¢ 22
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper ā¢ 2401.13601 ā¢ Published ā¢ 47 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper ā¢ 2402.13232 ā¢ Published ā¢ 15 -
Neural Network Diffusion
Paper ā¢ 2402.13144 ā¢ Published ā¢ 95 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper ā¢ 2402.13251 ā¢ Published ā¢ 14
-
Self-Rewarding Language Models
Paper ā¢ 2401.10020 ā¢ Published ā¢ 146 -
ReFT: Reasoning with Reinforced Fine-Tuning
Paper ā¢ 2401.08967 ā¢ Published ā¢ 30 -
Tuning Language Models by Proxy
Paper ā¢ 2401.08565 ā¢ Published ā¢ 22 -
TrustLLM: Trustworthiness in Large Language Models
Paper ā¢ 2401.05561 ā¢ Published ā¢ 69
-
Masked Autoencoders Are Scalable Vision Learners
Paper ā¢ 2111.06377 ā¢ Published ā¢ 3 -
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling
Paper ā¢ 2311.00430 ā¢ Published ā¢ 58 -
distil-whisper/distil-large-v2
Automatic Speech Recognition ā¢ Updated ā¢ 169k ā¢ 505 -
Seven Failure Points When Engineering a Retrieval Augmented Generation System
Paper ā¢ 2401.05856 ā¢ Published ā¢ 2