Update README.md
Browse filesAdd clarification on the improvement of Krikri over Llama
README.md
CHANGED
@@ -103,7 +103,7 @@ Our evaluation suite includes:
|
|
103 |
* An existing benchmark for question answering in Greek ([Belebele](https://arxiv.org/abs/2308.16884))
|
104 |
* A novel benchmark created by the ILSP team for medical question answering based on the medical exams of [DOATAP](https://www.doatap.gr) ([Medical MCQA](https://huggingface.co/datasets/ilsp/medical_mcqa_greek)).
|
105 |
|
106 |
-
We can see that our
|
107 |
|
108 |
| | Medical MCQA EL (15-shot) | Belebele EL (5-shot) | HellaSwag EL (10-shot) | ARC-Challenge EL (25-shot) | TruthfulQA MC2 EL (0-shot) | MMLU EL (5-shot) | Average |
|
109 |
|----------------|----------------|-------------|--------------|------------------|-------------------|---------|---------|
|
|
|
103 |
* An existing benchmark for question answering in Greek ([Belebele](https://arxiv.org/abs/2308.16884))
|
104 |
* A novel benchmark created by the ILSP team for medical question answering based on the medical exams of [DOATAP](https://www.doatap.gr) ([Medical MCQA](https://huggingface.co/datasets/ilsp/medical_mcqa_greek)).
|
105 |
|
106 |
+
We can see that our continual pretraining methodology enhances performance across all Greek test sets by a **+10.8%** average improvement over the base model. The results for the Greek test sets are shown in the following table:
|
107 |
|
108 |
| | Medical MCQA EL (15-shot) | Belebele EL (5-shot) | HellaSwag EL (10-shot) | ARC-Challenge EL (25-shot) | TruthfulQA MC2 EL (0-shot) | MMLU EL (5-shot) | Average |
|
109 |
|----------------|----------------|-------------|--------------|------------------|-------------------|---------|---------|
|