File size: 1,174 Bytes
a333ce1
 
 
6f51432
 
 
 
0f879ed
6f51432
0f879ed
 
 
 
 
 
 
 
6f51432
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: mit
---


# IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages

Paper link: https://arxiv.org/abs/2312.09508

Dataset link: https://huggingface.co/datasets/saifulhaq9/indicmarco

Model link: https://huggingface.co/saifulhaq9/indiccolbert

## Contributors & Acknowledgements

Key Contributors and Team Members: Saiful Haq, Ashutosh Sharma, Pushpak Bhattacharyya

## Kindly cite our paper, If you are are using our datasets or models:

@article{haq2023indicirsuite,
  title={IndicIRSuite: Multilingual Dataset and Neural Information Models for Indian Languages},
  author={Haq, Saiful and Sharma, Ashutosh and Bhattacharyya, Pushpak},
  journal={arXiv preprint arXiv:2312.09508},
  year={2023}
}

## About

This repository contains Multilingual ColBERT models in 11 Indian Languages.

## Language Code to Language Mapping

asm_Beng: Assamese Language

ben_Beng: Bengali Language

guj_Gujr: Gujarati Language

hin_Deva: Hindi Language

kan_Knda: Kannada Language

mal_Mlym: Malyalam Language

mar_Deva: Marathi Language

ory_Orya: Oriya Language

pan_Guru: Punjabi Language

tam_Taml: Tamil Language

tel_Telu: Telugu Language