aloobun commited on
Commit
5c664a3
·
verified ·
1 Parent(s): c43b202

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -75,4 +75,16 @@ Script ensures:
75
  - New tokens are correctly integrated.
76
  - Token mappings, etc
77
 
78
- I feel there are some unecessary bloat like token validation and redundant test methods in the script. I'm still working on how to improve things and will update as soon as I have any progress.
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  - New tokens are correctly integrated.
76
  - Token mappings, etc
77
 
78
+ I feel there are some unecessary bloat like token validation and redundant test methods in the script. I'm still working on how to improve things and will update as soon as I have any progress.
79
+
80
+ Here's a comparison of sub word **fertility** scores between [sarvam-1](https://huggingface.co/sarvamai/sarvam-1) and this model.
81
+
82
+ | |sarvam-1|IN-Llama-3-Tokenizer|
83
+ |--------|------|---------|
84
+ |Bengali|1.7 |3.52 |
85
+ |Gujrati|2.784313 |3.588235 |
86
+ |Hindi|1.583333 |2.933333 |
87
+ |Kannada|2.571428 |3.976190 |
88
+ |Malayalam|3.487804 |4.365853 |
89
+ |Tamil|2.767441 |3.860465 |
90
+ |Telugu|2.372093 |3.511627 |