Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models Paper • 2506.04689 • Published Jun 5
thaottn/datacomp-medium_basic_DFN_filtered_f0.2_translated_captions_AND_original_captions Zero-Shot Image Classification • Updated Feb 12 • 1
thaottn/datacomp-medium_basic_DFN_filtered_f0.2_translated_captions_AND_original_captions Zero-Shot Image Classification • Updated Feb 12 • 1
thaottn/datacomp-medium_basic_DFN_filtered_f0.2_translated_captions Zero-Shot Image Classification • Updated Feb 12 • 3
thaottn/datacomp-medium_basic_DFN_filtered_f0.2_translated_captions Zero-Shot Image Classification • Updated Feb 12 • 3
DataComp: In search of the next generation of multimodal datasets Paper • 2304.14108 • Published Apr 27, 2023 • 2
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP Paper • 2208.05516 • Published Aug 10, 2022
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17, 2024 • 55
Better Alignment with Instruction Back-and-Forth Translation Paper • 2408.04614 • Published Aug 8, 2024 • 16
thaottn/datacomp-medium_basic_DFN_filtered_f0.4_raw_captions Zero-Shot Image Classification • Updated Aug 2, 2024
thaottn/datacomp-medium_basic_DFN_filtered_f0.3_raw_captions Zero-Shot Image Classification • Updated Aug 2, 2024
thaottn/datacomp-medium_basic_DFN_filtered_f0.2_raw_captions Zero-Shot Image Classification • Updated Aug 2, 2024 • 1
thaottn/datacomp-medium_basic_DFN_filtered_f0.1_raw_captions Zero-Shot Image Classification • Updated Aug 2, 2024 • 2
thaottn/OpenCLIP-resnet50-Shutterstock15M Zero-Shot Image Classification • Updated Jan 4, 2024 • 1 • 1