arxiv:2509.01610

Improving Large Vision and Language Models by Learning from a Panel of Peers

Published on Sep 1

· Submitted by

taesiri on Sep 3

Upvote

Authors:

Vicente Ordonez ,

Kushal Kafle

Abstract

A Panel-of-Peers learning framework enhances Large Vision and Language Models by simulating peer reviews, improving performance without extensive human-labeled datasets.

AI-generated summary

Traditional alignment methods for Large Vision and Language Models (LVLMs) primarily rely on human-curated preference data. Human-generated preference data is costly; machine-generated preference data is limited in quality; and self-supervised preference data often introduces hallucinations. To overcome these limitations, we propose a novel Panel-of-Peers learning framework inspired by collaborative learning among humans. This approach leverages a panel of LVLMs, each evaluating and learning from their collective outputs through an iterative self-improvement process. By simulating a peer review system, our models generate, assess, and refine outputs in response to a curated set of prompts, mimicking a classroom learning environment. We demonstrate that this methodology enhances model performance without requiring extensive human-labeled datasets. Our experiments show significant improvement across multiple benchmarks, demonstrating the potential of peer evaluations as a scalable alternative to self-supervised alignment. Notably, we show that Panel-of-Peers increases the average score on fifteen benchmarks from 48% to 57%

View arXiv page View PDF Add to collection

Community

taesiri

Paper submitter 1 day ago

Traditional alignment methods for Large Vision and Language Models (LVLMs) primarily rely on human-curated preference data. Human-generated preference data is costly; machine-generated preference data is limited in quality; and self-supervised preference data often introduces hallucinations. To overcome these limitations, we propose a novel Panel-of-Peers learning framework inspired by collaborative learning among humans. This approach leverages a panel of LVLMs, each evaluating and learning from their collective outputs through an iterative self-improvement process. By simulating a peer review system, our models generate, assess, and refine outputs in response to a curated set of prompts, mimicking a classroom learning environment. We demonstrate that this methodology enhances model performance without requiring extensive human-labeled datasets. Our experiments show significant improvement across multiple benchmarks, demonstrating the potential of peer evaluations as a scalable alternative to self-supervised alignment. Notably, we show that Panel-of-Peers increases the average score on fifteen benchmarks from 48% to 57%

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.01610 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.01610 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.01610 in a Space README.md to link it from this page.