Papers
arxiv:2509.01250

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

Published on Sep 1
ยท Submitted by aHapBean on Sep 3
Authors:
,
,

Abstract

Point-PQAE, a two-view cross-reconstruction generative paradigm, enhances 3D self-supervised learning by introducing greater diversity and variance, outperforming single-view methods in point cloud reconstruction tasks.

AI-generated summary

Point cloud learning, especially in a self-supervised way without manual labels, has gained growing attention in both vision and learning communities due to its potential utility in a wide range of applications. Most existing generative approaches for point cloud self-supervised learning focus on recovering masked points from visible ones within a single view. Recognizing that a two-view pre-training paradigm inherently introduces greater diversity and variance, it may thus enable more challenging and informative pre-training. Inspired by this, we explore the potential of two-view learning in this domain. In this paper, we propose Point-PQAE, a cross-reconstruction generative paradigm that first generates two decoupled point clouds/views and then reconstructs one from the other. To achieve this goal, we develop a crop mechanism for point cloud view generation for the first time and further propose a novel positional encoding to represent the 3D relative position between the two decoupled views. The cross-reconstruction significantly increases the difficulty of pre-training compared to self-reconstruction, which enables our method to surpass previous single-modal self-reconstruction methods in 3D self-supervised learning. Specifically, it outperforms the self-reconstruction baseline (Point-MAE) by 6.5%, 7.0%, and 6.7% in three variants of ScanObjectNN with the Mlp-Linear evaluation protocol. The code is available at https://github.com/aHapBean/Point-PQAE.

Community

Paper submitter

Point cloud learning, especially in a self-supervised way without manual labels, has gained growing attention in both vision and learning communities due to its potential utility in a wide range of applications. Most existing generative approaches for point cloud self-supervised learning focus on recovering masked points from visible ones within a single view. Recognizing that a two-view pre-training paradigm inherently introduces greater diversity and variance, it may thus enable more challenging and informative pre-training. Inspired by this, we explore the potential of two-view learning in this domain. In this paper, we propose Point-PQAE, a cross-reconstruction generative paradigm that first generates two decoupled point clouds/views and then reconstructs one from the other. To achieve this goal, we develop a crop mechanism for point cloud view generation for the first time and further propose a novel positional encoding to represent the 3D relative position between the two decoupled views. The cross-reconstruction significantly increases the difficulty of pre-training compared to self-reconstruction, which enables our method to surpass previous single-modal self-reconstruction methods in 3D self-supervised learning. Specifically, it outperforms the self-reconstruction baseline (Point-MAE) by 6.5%, 7.0%, and 6.7% in three variants of ScanObjectNN with the Mlp-Linear evaluation protocol. The code is available at https://github.com/aHapBean/Point-PQAE.

Hi @aHapBean - Thanks for sharing this paper! Would be great to claim it with your HF account by clicking your name on the page. ๐Ÿค—

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.01250 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.01250 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.01250 in a Space README.md to link it from this page.

Collections including this paper 1