Spaces:
Running
Running
Update README image urls
#2
by
altndrr
- opened
README.md
CHANGED
@@ -20,9 +20,9 @@ Recent advances in large vision-language models have revolutionized the image cl
|
|
20 |
|
21 |
<div align="center">
|
22 |
|
23 |
-
| <img src="https://
|
24 |
-
|
|
25 |
-
|
|
26 |
|
27 |
</div>
|
28 |
|
@@ -30,7 +30,7 @@ In this work, we first empirically verify that representing this semantic space
|
|
30 |
|
31 |
<div align="center">
|
32 |
|
33 |
-
|
|
34 |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
35 |
| Overview of CaSED. Given an input image, CaSED retrieves the most relevant captions from an external database filtering them to extract candidate categories. We classify image-to-text and text-to-text, using the retrieved captions centroid as the textual counterpart of the input image. |
|
36 |
|
|
|
20 |
|
21 |
<div align="center">
|
22 |
|
23 |
+
| <img src="https://alessandroconti.me/papers/assets/2306.00917/images/task_left.webp"> | <img src="https://alessandroconti.me/papers/assets/2306.00917/images/task_right.webp"> |
|
24 |
+
| :-----------------------------------------------------------------------------------: | :------------------------------------------------------------------------------------: |
|
25 |
+
| Vision Language Model (VLM)-based classification | Vocabulary-free Image Classification |
|
26 |
|
27 |
</div>
|
28 |
|
|
|
30 |
|
31 |
<div align="center">
|
32 |
|
33 |
+
| <img src="https://alessandroconti.me/papers/assets/2306.00917/images/method.webp"> |
|
34 |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
|
35 |
| Overview of CaSED. Given an input image, CaSED retrieves the most relevant captions from an external database filtering them to extract candidate categories. We classify image-to-text and text-to-text, using the retrieved captions centroid as the textual counterpart of the input image. |
|
36 |
|