ilsp
/

Llama-Krikri-8B-Base

@@ -103,7 +103,7 @@ Our evaluation suite includes:
 * An existing benchmark for question answering in Greek ([Belebele](https://arxiv.org/abs/2308.16884))
 * A novel benchmark created by the ILSP team for medical question answering based on the medical exams of [DOATAP](https://www.doatap.gr) ([Medical MCQA](https://huggingface.co/datasets/ilsp/medical_mcqa_greek)).
-We can see that our training enhances performance across all Greek test sets by a **+10.8%** average improvement. The results for the Greek test sets are shown in the following table:
 |                | Medical MCQA EL (15-shot) | Belebele EL (5-shot) | HellaSwag EL (10-shot) | ARC-Challenge EL (25-shot) | TruthfulQA MC2 EL (0-shot) | MMLU EL (5-shot) | Average |
 |----------------|----------------|-------------|--------------|------------------|-------------------|---------|---------|

 * An existing benchmark for question answering in Greek ([Belebele](https://arxiv.org/abs/2308.16884))
 * A novel benchmark created by the ILSP team for medical question answering based on the medical exams of [DOATAP](https://www.doatap.gr) ([Medical MCQA](https://huggingface.co/datasets/ilsp/medical_mcqa_greek)).
+We can see that our continual pretraining methodology enhances performance across all Greek test sets by a **+10.8%** average improvement over the base model. The results for the Greek test sets are shown in the following table:
 |                | Medical MCQA EL (15-shot) | Belebele EL (5-shot) | HellaSwag EL (10-shot) | ARC-Challenge EL (25-shot) | TruthfulQA MC2 EL (0-shot) | MMLU EL (5-shot) | Average |
 |----------------|----------------|-------------|--------------|------------------|-------------------|---------|---------|