BERGEN: A Benchmarking Library for Retrieval-Augmented Generation Paper • 2407.01102 • Published Jul 1, 2024
CodeBPE: Investigating Subtokenization Options for Large Language Model Pretraining on Source Code Paper • 2308.00683 • Published Aug 1, 2023