droussis commited on
Commit
05f4afc
·
verified ·
1 Parent(s): 4e4f7dc

Update README.md

Browse files

Add clarification on the improvement of Krikri over Llama

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -103,7 +103,7 @@ Our evaluation suite includes:
103
  * An existing benchmark for question answering in Greek ([Belebele](https://arxiv.org/abs/2308.16884))
104
  * A novel benchmark created by the ILSP team for medical question answering based on the medical exams of [DOATAP](https://www.doatap.gr) ([Medical MCQA](https://huggingface.co/datasets/ilsp/medical_mcqa_greek)).
105
 
106
- We can see that our training enhances performance across all Greek test sets by a **+10.8%** average improvement. The results for the Greek test sets are shown in the following table:
107
 
108
  | | Medical MCQA EL (15-shot) | Belebele EL (5-shot) | HellaSwag EL (10-shot) | ARC-Challenge EL (25-shot) | TruthfulQA MC2 EL (0-shot) | MMLU EL (5-shot) | Average |
109
  |----------------|----------------|-------------|--------------|------------------|-------------------|---------|---------|
 
103
  * An existing benchmark for question answering in Greek ([Belebele](https://arxiv.org/abs/2308.16884))
104
  * A novel benchmark created by the ILSP team for medical question answering based on the medical exams of [DOATAP](https://www.doatap.gr) ([Medical MCQA](https://huggingface.co/datasets/ilsp/medical_mcqa_greek)).
105
 
106
+ We can see that our continual pretraining methodology enhances performance across all Greek test sets by a **+10.8%** average improvement over the base model. The results for the Greek test sets are shown in the following table:
107
 
108
  | | Medical MCQA EL (15-shot) | Belebele EL (5-shot) | HellaSwag EL (10-shot) | ARC-Challenge EL (25-shot) | TruthfulQA MC2 EL (0-shot) | MMLU EL (5-shot) | Average |
109
  |----------------|----------------|-------------|--------------|------------------|-------------------|---------|---------|