|
--- |
|
license: mit |
|
--- |
|
|
|
|
|
# IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages |
|
|
|
Paper link: https://arxiv.org/abs/2312.09508 |
|
|
|
Dataset link: https://huggingface.co/datasets/saifulhaq9/indicmarco |
|
|
|
Model link: https://huggingface.co/saifulhaq9/indiccolbert |
|
|
|
## Contributors & Acknowledgements |
|
|
|
Key Contributors and Team Members: Saiful Haq, Ashutosh Sharma, Pushpak Bhattacharyya |
|
|
|
## Kindly cite our paper, If you are are using our datasets or models: |
|
|
|
@article{haq2023indicirsuite, |
|
title={IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages}, |
|
author={Haq, Saiful and Sharma, Ashutosh and Bhattacharyya, Pushpak}, |
|
journal={arXiv preprint arXiv:2312.09508}, |
|
year={2023} |
|
} |
|
|
|
## About |
|
|
|
This repository contains Multilingual ColBERT models in 11 Indian Languages. |
|
|
|
## Language Code to Language Mapping |
|
|
|
asm_Beng: Assamese Language |
|
|
|
ben_Beng: Bengali Language |
|
|
|
guj_Gujr: Gujarati Language |
|
|
|
hin_Deva: Hindi Language |
|
|
|
kan_Knda: Kannada Language |
|
|
|
mal_Mlym: Malyalam Language |
|
|
|
mar_Deva: Marathi Language |
|
|
|
ory_Orya: Oriya Language |
|
|
|
pan_Guru: Punjabi Language |
|
|
|
tam_Taml: Tamil Language |
|
|
|
tel_Telu: Telugu Language |
|
|
|
|
|
|