Spaces:

altndrr
/

vic

Running

App Files Files Community

Update README image urls

by altndrr - opened Nov 30, 2023

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-4

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -20,9 +20,9 @@ Recent advances in large vision-language models have revolutionized the image cl
 <div align="center">
-| <img src="https://altndrr.github.io/vic/assets/images/task_left.png"> | <img src="https://altndrr.github.io/vic/assets/images/task_right.png"> |
-| :-------------------------------------------------------------------: | :--------------------------------------------------------------------: |
-|           Vision Language Model (VLM)-based classification            |                  Vocabulary-free Image Classification                  |
 </div>
@@ -30,7 +30,7 @@ In this work, we first empirically verify that representing this semantic space
 <div align="center">
-|                                                                                                               <img src="https://altndrr.github.io/vic/assets/images/method.png">                                                                                                               |
 | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
 | Overview of CaSED. Given an input image, CaSED retrieves the most relevant captions from an external database filtering them to extract candidate categories. We classify image-to-text and text-to-text, using the retrieved captions centroid as the textual counterpart of the input image. |

 <div align="center">
+| <img src="https://alessandroconti.me/papers/assets/2306.00917/images/task_left.webp"> | <img src="https://alessandroconti.me/papers/assets/2306.00917/images/task_right.webp"> |
+| :-----------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------: |
+|                   Vision Language Model (VLM)-based classification                    |                          Vocabulary-free Image Classification                          |
 </div>
 <div align="center">
+|                                                                                                       <img src="https://alessandroconti.me/papers/assets/2306.00917/images/method.webp">                                                                                                       |
 | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
 | Overview of CaSED. Given an input image, CaSED retrieves the most relevant captions from an external database filtering them to extract candidate categories. We classify image-to-text and text-to-text, using the retrieved captions centroid as the textual counterpart of the input image. |