Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face Paper β’ 2302.14534 β’ Published Feb 28, 2023
Zero-Shot Listwise Document Reranking with a Large Language Model Paper β’ 2305.02156 β’ Published May 3, 2023 β’ 1
Evaluating Embedding APIs for Information Retrieval Paper β’ 2305.06300 β’ Published May 10, 2023 β’ 1
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration Paper β’ 2306.01481 β’ Published Jun 2, 2023 β’ 1
What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations Paper β’ 2311.18812 β’ Published Nov 30, 2023
NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation Paper β’ 2312.11361 β’ Published Dec 18, 2023 β’ 1
Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval Paper β’ 2108.08787 β’ Published Aug 19, 2021
HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution Paper β’ 2307.16883 β’ Published Jul 31, 2023
Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages Paper β’ 2210.09984 β’ Published Oct 18, 2022 β’ 2
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture Paper β’ 2406.11030 β’ Published Jun 16, 2024
Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models Paper β’ 2310.07712 β’ Published Oct 11, 2023
MMTEB: Massive Multilingual Text Embedding Benchmark Paper β’ 2502.13595 β’ Published 23 days ago β’ 32
view post Post 1732 The folks at Foursquare released a dataset of 104.5 million places of interest ( foursquare/fsq-os-places) and here's all of them on a plot See translation 4 replies Β· π₯ 5 5 π 1 1 π 1 1 + Reply
view post Post 2403 The Lichess database of games, puzzles, and engine evaluations is now on the Hub: https://huggingface.co/LichessBillions of chess data points to download, query, and stream and we're excited to see what you'll build with it! βοΈ π€- Lichess/positions-datasets-66f50837db5cd3287d60d489- Lichess/games-datasets-66f508df78f4b43e1bb2d353 See translation π 7 7 β€οΈ 2 2 π₯ 2 2 + Reply
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper β’ 2404.18796 β’ Published Apr 29, 2024 β’ 69
StarCoder 2 and The Stack v2: The Next Generation Paper β’ 2402.19173 β’ Published Feb 29, 2024 β’ 138
view post Post TIL: EleutherAI/pile is on Wikipedia: https://en.wikipedia.org/wiki/The_Pile_(dataset) π€― 5 5 π€ 4 4 β€οΈ 1 1 + Reply
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face Paper β’ 2302.14534 β’ Published Feb 28, 2023