hy1111 commited on
Commit
346b959
·
verified ·
1 Parent(s): 3243826

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -1,6 +1,6 @@
1
  # CLIP-RS: Vision-Language Pre-training with Data Purification for Remote Sensing
2
 
3
- ![CLIP-RS Logo](CLIP-RS.png)
4
 
5
 
6
  CLIP-RS is a pre-trained model based on CLIP (Contrastive Language-Image Pre-training) tailored for remote sensing applications. This model is trained on a 10M large-scale remote sensing image-text dataset, providing powerful perception capabilities for tasks related to remote sensing images.
@@ -27,7 +27,7 @@ The training data is sourced from two types of datasets:
27
  ### 2. Data Filtering
28
  To refine the coarse dataset, we propose a data filtering strategy using the CLIP-based model, $\text{CLIP}_{\text{Sem}}$. This model is pre-trained on high-quality captions to ensure that only semantically accurate image-text pairs are retained. The similarity scores (SS) between each image-text pair are calculated, and captions with low similarity are discarded.
29
 
30
- ![Data Purification Process](newversion.png)
31
  *Figure 1: Data Refinement Process of the CLIP-RS Dataset. Left: Workflow for filtering and refining low-quality captions. Right: Examples of low-quality captions and their refined versions.*
32
 
33
  ### 3. Data Refinement
 
1
  # CLIP-RS: Vision-Language Pre-training with Data Purification for Remote Sensing
2
 
3
+ ![CLIP-RS Logo](figure/CLIP-RS.png)
4
 
5
 
6
  CLIP-RS is a pre-trained model based on CLIP (Contrastive Language-Image Pre-training) tailored for remote sensing applications. This model is trained on a 10M large-scale remote sensing image-text dataset, providing powerful perception capabilities for tasks related to remote sensing images.
 
27
  ### 2. Data Filtering
28
  To refine the coarse dataset, we propose a data filtering strategy using the CLIP-based model, $\text{CLIP}_{\text{Sem}}$. This model is pre-trained on high-quality captions to ensure that only semantically accurate image-text pairs are retained. The similarity scores (SS) between each image-text pair are calculated, and captions with low similarity are discarded.
29
 
30
+ ![Data Purification Process](figure/newversion.png)
31
  *Figure 1: Data Refinement Process of the CLIP-RS Dataset. Left: Workflow for filtering and refining low-quality captions. Right: Examples of low-quality captions and their refined versions.*
32
 
33
  ### 3. Data Refinement