peizesun (peizesun)

authored 11 papers 5 months ago

Perception Encoder: The best visual embeddings are not at the output of the network

Paper • 2504.13181 • Published Apr 17 • 35

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Paper • 2504.13180 • Published Apr 17 • 18

authored a paper 7 months ago

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Paper • 2502.05179 • Published Feb 7 • 24

authored 2 papers 11 months ago

Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment

Paper • 2410.09347 • Published Oct 12, 2024 • 5

ControlAR: Controllable Image Generation with Autoregressive Models

Paper • 2410.02705 • Published Oct 3, 2024 • 11

authored a paper about 1 year ago

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 72

authored 2 papers about 2 years ago

Semantic-SAM: Segment and Recognize Anything at Any Granularity

Paper • 2307.04767 • Published Jul 10, 2023 • 22

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Paper • 2307.03601 • Published Jul 7, 2023 • 12

authored a paper over 2 years ago

Going Denser with Open-Vocabulary Part Segmentation

Paper • 2305.11173 • Published May 18, 2023 • 2

peizesun

AI & ML interests

Organizations

ByteTrack: Multi-Object Tracking by Associating Every Detection Box

DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion

DiffusionDet: Diffusion Model for Object Detection

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

Sparse R-CNN: End-to-End Object Detection with Learnable Proposals

Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM

Goku: Flow Based Video Generative Foundation Models

Language as Queries for Referring Video Object Segmentation

PixelFlow: Pixel-Space Generative Models with Flow

Perception Encoder: The best visual embeddings are not at the output of the network

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation

Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment

ControlAR: Controllable Image Generation with Autoregressive Models

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Semantic-SAM: Segment and Recognize Anything at Any Granularity

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

Going Denser with Open-Vocabulary Part Segmentation

peizesun

AI & ML interests

Organizations

peizesun's activity