hy1111
/

CLIP-RS

hy1111 commited on 19 days ago

Commit

346b959

verified ·

1 Parent(s): 3243826

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # CLIP-RS: Vision-Language Pre-training with Data Purification for Remote Sensing
-![CLIP-RS Logo](CLIP-RS.png)
 CLIP-RS is a pre-trained model based on CLIP (Contrastive Language-Image Pre-training) tailored for remote sensing applications. This model is trained on a 10M large-scale remote sensing image-text dataset, providing powerful perception capabilities for tasks related to remote sensing images.
@@ -27,7 +27,7 @@ The training data is sourced from two types of datasets:
 ### 2. Data Filtering
 To refine the coarse dataset, we propose a data filtering strategy using the CLIP-based model, $\text{CLIP}_{\text{Sem}}$. This model is pre-trained on high-quality captions to ensure that only semantically accurate image-text pairs are retained. The similarity scores (SS) between each image-text pair are calculated, and captions with low similarity are discarded.
-![Data Purification Process](newversion.png)
 *Figure 1: Data Refinement Process of the CLIP-RS Dataset. Left: Workflow for filtering and refining low-quality captions. Right: Examples of low-quality captions and their refined versions.*
 ### 3. Data Refinement

 # CLIP-RS: Vision-Language Pre-training with Data Purification for Remote Sensing
+![CLIP-RS Logo](figure/CLIP-RS.png)
 CLIP-RS is a pre-trained model based on CLIP (Contrastive Language-Image Pre-training) tailored for remote sensing applications. This model is trained on a 10M large-scale remote sensing image-text dataset, providing powerful perception capabilities for tasks related to remote sensing images.
 ### 2. Data Filtering
 To refine the coarse dataset, we propose a data filtering strategy using the CLIP-based model, $\text{CLIP}_{\text{Sem}}$. This model is pre-trained on high-quality captions to ensure that only semantically accurate image-text pairs are retained. The similarity scores (SS) between each image-text pair are calculated, and captions with low similarity are discarded.
+![Data Purification Process](figure/newversion.png)
 *Figure 1: Data Refinement Process of the CLIP-RS Dataset. Left: Workflow for filtering and refining low-quality captions. Right: Examples of low-quality captions and their refined versions.*
 ### 3. Data Refinement