avemio
/

German-RAG-NEMO-12B-SFT-HESSIAN-AI

@@ -133,16 +133,17 @@ Four evaluation metrics were employed across all subsets: language quality, over
 -   **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
-| Metric                                    | [Vanila-Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) | [GRAG-NEMO-SFT](https://huggingface.co/avemio/GRAG-NEMO-12B-SFT-HESSIAN-AI) | [GRAG-NEMO-ORPO](https://huggingface.co/avemio/GRAG-NEMO-12B-ORPO-HESSIAN-AI) | [GRAG-NEMO-MERGED]() | GPT-3.5-TURBO |
 |------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|----------------|
-| **Average_language_quality**             | 85.88                                                                          | 89.61                                                                          | 89.1                                                                                          |                             |                |
-| **extraction_recall_weighted_overall_score**       | 35.2                                                                           | 52.3                                                                           | 48.8                                                                                          |                             |                |
-| **qa_multiple_references_weighted_overall_score** | 65.3                                                                           | 71.0                                                                           | 74.0                                                                                          |                             |                |
-| **qa_without_time_difference_weighted_overall_score** | 71.5                                                                           | 85.6                                                                           | 85.6                                                                                          |                             |                |
-| **qa_with_time_difference_weighted_overall_score** | 65.3                                                                           | 87.9                                                                           | 85.4                                                                                          |                             |                |
-| **reasoning_weighted_overall_score**              | 69.4                                                                           | 71.5                                                                           | 73.4                                                                                          |                             |                |
-| **relevant_context_weighted_overall_score**       | 71.3                                                                           | 69.1                                                                           | 65.5                                                                                          |                             |                |
-| **summarizations_weighted_overall_score**         | 73.8                                                                           | 81.6                                                                           | 80.3                                                                                          |                             |                |
 ## Model Details

 -   **Overall score:** This metric combined the results from the previous three metrics, offering a comprehensive evaluation of the model's capabilities across all subsets.
+| Metric                                    | [Vanila-Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) | **[GRAG-NEMO-SFT](https://huggingface.co/avemio/GRAG-NEMO-12B-SFT-HESSIAN-AI)** | [GRAG-NEMO-ORPO](https://huggingface.co/avemio/GRAG-NEMO-12B-ORPO-HESSIAN-AI) | [GRAG-NEMO-MERGED]() | GPT-3.5-TURBO |
 |------------------------------------------|---------------------------------------------------------------------------------|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|-----------------------------|----------------|
+| Average Language Quality             | 85.88                                                                          | **89.61**                                                                     | 89.1                                                                                          |                             |                |
+| **OVERALL SCORES (weighted):**       |                                                                            |                                                                            |                                                                                           |                             |               |
+| extraction_recall       | 35.2                                                                           | **52.3**                                                                           | 48.8                                                                                          |                             |                |
+| qa_multiple_references | 65.3                                                                           | **71.0**                                                                           | 74.0                                                                                          |                             |                |
+| qa_without_time_difference | 71.5                                                                           | **85.6**                                                                           | 85.6                                                                                          |                             |                |
+| qa_with_time_difference | 65.3                                                                           | **87.9**                                                                           | 85.4                                                                                          |                             |                |
+| reasoning              | 69.4                                                                           | **71.5**                                                                           | 73.4                                                                                          |                             |                |
+| relevant_context       | 71.3                                                                           | **69.1**                                                                           | 65.5                                                                                          |                             |                |
+| summarizations         | 73.8                                                                           | **81.6**                                                                           | 80.3                                                                                          |                             |                |
 ## Model Details