Thao Nguyen

thaottn

https://thaonguyen19.github.io/

AI & ML interests

None yet

Recent Activity

updated a dataset 5 days ago

facebook/recycling_the_web

published a dataset 6 days ago

facebook/recycling_the_web

authored a paper 2 months ago

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

View all activity

Organizations

updated a dataset 5 days ago

facebook/recycling_the_web

Viewer • Updated 5 days ago • 60.3M • 1.09k • 35

published a dataset 6 days ago

facebook/recycling_the_web

Viewer • Updated 5 days ago • 60.3M • 1.09k • 35

authored a paper 2 months ago

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Paper • 2506.04689 • Published Jun 5

updated a model 7 months ago

thaottn/datacomp-medium_basic_DFN_filtered_f0.2_translated_captions_AND_original_captions

Zero-Shot Image Classification • Updated Feb 12 • 1

published a model 7 months ago

thaottn/datacomp-medium_basic_DFN_filtered_f0.2_translated_captions_AND_original_captions

Zero-Shot Image Classification • Updated Feb 12 • 1

updated a model 7 months ago

thaottn/datacomp-medium_basic_DFN_filtered_f0.2_translated_captions

Zero-Shot Image Classification • Updated Feb 12 • 3

published a model 7 months ago

thaottn/datacomp-medium_basic_DFN_filtered_f0.2_translated_captions

Zero-Shot Image Classification • Updated Feb 12 • 3

authored 4 papers about 1 year ago

DataComp: In search of the next generation of multimodal datasets

Paper • 2304.14108 • Published Apr 27, 2023 • 2

Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP

Paper • 2208.05516 • Published Aug 10, 2022

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17, 2024 • 55

Better Alignment with Instruction Back-and-Forth Translation

Paper • 2408.04614 • Published Aug 8, 2024 • 16

updated 4 models about 1 year ago

updated 5 models over 1 year ago

thaottn/OpenCLIP-resnet50-RedCaps12M

Zero-Shot Image Classification • Updated Jan 4, 2024 • 1 • 1

thaottn/OpenCLIP-resnet50-YFCC15M

Zero-Shot Image Classification • Updated Jan 4, 2024 • 9

thaottn/OpenCLIP-resnet50-LAION15M

Zero-Shot Image Classification • Updated Jan 4, 2024 • 7

thaottn/OpenCLIP-resnet50-Shutterstock15M

Zero-Shot Image Classification • Updated Jan 4, 2024 • 1 • 1

thaottn/OpenCLIP-resnet50-CC12M

Zero-Shot Image Classification • Updated Jan 4, 2024 • 271

Thao Nguyen

AI & ML interests

Recent Activity

Organizations

thaottn's activity