Image-to-Text
Chinese
English
OpenFace-CQUPT commited on
Commit
77f390e
·
verified ·
1 Parent(s): 1fd8182

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -6,7 +6,7 @@ This repository is the official implementation of [FaceCaption-15M]().
6
 
7
  ![image-20240318101027127](https://img.yutangli.net/img/202403181010116.png)
8
 
9
- **(a). Same color represents shared parameters. “12x” stands for 12-layer transformer modules. (b), (c) and (d) FLIP-based model are applied to the tasks of text-image retrieval, facial attributes prediction and sketch less facial image retrieval, respectively.**
10
 
11
  ## Training
12
 
@@ -49,6 +49,8 @@ Download the FaceCaption-15M dataset from [here](https://huggingface.co/dataset
49
 
50
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/663f06e01cd68975883a353e/snd-9JBKJnRuZpm0Wp38f.png)
51
 
 
 
52
  ## Contacts
53
54
 
 
6
 
7
  ![image-20240318101027127](https://img.yutangli.net/img/202403181010116.png)
8
 
9
+ **Fig.1:(a). Same color represents shared parameters. “12x” stands for 12-layer transformer modules. (b), (c) and (d) FLIP-based model are applied to the tasks of text-image retrieval, facial attributes prediction and sketch less facial image retrieval, respectively.**
10
 
11
  ## Training
12
 
 
49
 
50
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/663f06e01cd68975883a353e/snd-9JBKJnRuZpm0Wp38f.png)
51
 
52
+ **Fig.2:Demonstration of our FLIP-based model on the SLFIR task. Both methods can retrieve the target face photo from the top-5 list using a partial sketch. Our proposed FLIP-based model can achieve this using fewer strokes than the baseline. The number at the bottom denotes the rank of the paired (true match) photos at every stage.**
53
+
54
  ## Contacts
55
56