DISLab
/

SummLlama3-8B

Model card Files Files and versions

Hwanjun commited on Oct 22, 2024

Commit

56be7f8

·

verified ·

1 Parent(s): 69a182c

Update README.md

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -18,7 +18,18 @@ The feedback encompasses a wide range of input documents, from short to lengthy
 - Four non-dialouge domains: News, Lifestyle, Report, Medical
 - Three dialogue domains: Daily Life, Interview, Meeting
-Surprisingly, it outperforms the nearly 10x larger Llama3-70B-Instruct while offering much faster inference speed.
 This is human evaluation results:

 - Four non-dialouge domains: News, Lifestyle, Report, Medical
 - Three dialogue domains: Daily Life, Interview, Meeting
+Surprisingly, it outperforms the nearly 10x larger **Llama3-70B-Instruct** and also **GPT-4o** while offering much faster inference speed.
+This is automated evaluation results:
+| **Config.**        | **Faithfulness** | **Completeness** | **Conciseness** | **Average** |
+|--------------------|------------|-----------|-----------|----------|
+| Llama3-8B-Instruct          | 0.864      | 0.583     | 0.450     | 0.632    |
+| Llama3-70B-Instruct        | 0.931      | 0.596     | 0.487     | 0.671    |
+| GPT-4o        | 0.940      | 0.657     | 0.437     | 0.678    |
+| SummLlama3-8B  | 0.931  | 0.614 | 0.659 | 0.735 |
+| SummLlama3-70B  | 0.950  | 0.632 | 0.754 | 0.779 |
 This is human evaluation results: