Update README.md
Browse files
README.md
CHANGED
@@ -32,6 +32,19 @@ This is enhanced version of AnySomniumXL v3
|
|
32 |
* Better stylizing on untrained token
|
33 |
|
34 |
# Our Dataset Process Curation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
Our dataset is scored using Pretrained CLIP+MLP Aesthetic Scoring model by https://github.com/christophschuhmann/improved-aesthetic-predictor, and We made adjusment into our script to detecting any text or watermark by utilizing OCR by pytesseract
|
36 |
|
37 |
This scoring method has scale between -1-100, we take the score threshold around 17 or 20 as minimum and 65-75 as maximum to pretain the 2D style of the dataset, Any images with text will returning -1 score. So any images with score below 17 or above 65 is deleted
|
|
|
32 |
* Better stylizing on untrained token
|
33 |
|
34 |
# Our Dataset Process Curation
|
35 |
+
# Our Dataset Process Curation
|
36 |
+
<p align="center">
|
37 |
+
<img src="Curation.png" width=70% height=70%>
|
38 |
+
</p>
|
39 |
+
|
40 |
+
Image source: [Source1](https://danbooru.donmai.us/posts/3143351) [Source2](https://danbooru.donmai.us/posts/3272710) [Source3](https://danbooru.donmai.us/posts/3320417)
|
41 |
+
|
42 |
+
Our dataset is scored using Pretrained CLIP+MLP Aesthetic Scoring model by https://github.com/christophschuhmann/improved-aesthetic-predictor, and We made adjusment into our script to detecting any text or watermark by utilizing OCR by pytesseract
|
43 |
+
|
44 |
+
This scoring method has scale between -1-100, we take the score threshold around 17 or 20 as minimum and 65-75 as maximum to pretain the 2D style of the dataset, Any images with text will returning -1 score. So any images with score below 17 or above 65 is deleted
|
45 |
+
|
46 |
+
The dataset curation proccess is using Nvidia T4 16GB Machine and takes about 7 days for curating 1.000.000 images.
|
47 |
+
|
48 |
Our dataset is scored using Pretrained CLIP+MLP Aesthetic Scoring model by https://github.com/christophschuhmann/improved-aesthetic-predictor, and We made adjusment into our script to detecting any text or watermark by utilizing OCR by pytesseract
|
49 |
|
50 |
This scoring method has scale between -1-100, we take the score threshold around 17 or 20 as minimum and 65-75 as maximum to pretain the 2D style of the dataset, Any images with text will returning -1 score. So any images with score below 17 or above 65 is deleted
|