arxiv:2508.14197

CLIPSym: Delving into Symmetry Detection with CLIP

Published on Aug 19

· Submitted by

Authors:

Abstract

CLIPSym, a vision-language model using CLIP, enhances symmetry detection through a rotation-equivariant decoder and semantic-aware prompting, outperforming existing methods on standard datasets.

AI-generated summary

Symmetry is one of the most fundamental geometric cues in computer vision, and detecting it has been an ongoing challenge. With the recent advances in vision-language models,~i.e., CLIP, we investigate whether a pre-trained CLIP model can aid symmetry detection by leveraging the additional symmetry cues found in the natural image descriptions. We propose CLIPSym, which leverages CLIP's image and language encoders and a rotation-equivariant decoder based on a hybrid of Transformer and G-Convolution to detect rotation and reflection symmetries. To fully utilize CLIP's language encoder, we have developed a novel prompting technique called Semantic-Aware Prompt Grouping (SAPG), which aggregates a diverse set of frequent object-based prompts to better integrate the semantic cues for symmetry detection. Empirically, we show that CLIPSym outperforms the current state-of-the-art on three standard symmetry detection datasets (DENDI, SDRW, and LDRS). Finally, we conduct detailed ablations verifying the benefits of CLIP's pre-training, the proposed equivariant decoder, and the SAPG technique. The code is available at https://github.com/timyoung2333/CLIPSym.

View arXiv page View PDF Add to collection

Community

ashiq24

Paper submitter about 22 hours ago

Github: https://github.com/timyoung2333/CLIPSym

ashiq24

Paper submitter about 22 hours ago

•

edited about 22 hours ago

The work introduces a novel framework for detecting reflection and rotation symmetries in images by leveraging pre-trained vision-language models (VLMs), specifically CLIP. The core hypothesis is that CLIP's pre-training on vast image-text pairs contains inherent symmetry cues that can be beneficial for this task.

librarian-bot

about 3 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.14197 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.14197 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.14197 in a Space README.md to link it from this page.