Hwanjun commited on
Commit
56be7f8
·
verified ·
1 Parent(s): 69a182c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -18,7 +18,18 @@ The feedback encompasses a wide range of input documents, from short to lengthy
18
  - Four non-dialouge domains: News, Lifestyle, Report, Medical
19
  - Three dialogue domains: Daily Life, Interview, Meeting
20
 
21
- Surprisingly, it outperforms the nearly 10x larger Llama3-70B-Instruct while offering much faster inference speed.
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  This is human evaluation results:
24
 
 
18
  - Four non-dialouge domains: News, Lifestyle, Report, Medical
19
  - Three dialogue domains: Daily Life, Interview, Meeting
20
 
21
+ Surprisingly, it outperforms the nearly 10x larger **Llama3-70B-Instruct** and also **GPT-4o** while offering much faster inference speed.
22
+
23
+ This is automated evaluation results:
24
+
25
+ | **Config.** | **Faithfulness** | **Completeness** | **Conciseness** | **Average** |
26
+ |--------------------|------------|-----------|-----------|----------|
27
+ | Llama3-8B-Instruct | 0.864 | 0.583 | 0.450 | 0.632 |
28
+ | Llama3-70B-Instruct | 0.931 | 0.596 | 0.487 | 0.671 |
29
+ | GPT-4o | 0.940 | 0.657 | 0.437 | 0.678 |
30
+ | SummLlama3-8B | 0.931 | 0.614 | 0.659 | 0.735 |
31
+ | SummLlama3-70B | 0.950 | 0.632 | 0.754 | 0.779 |
32
+
33
 
34
  This is human evaluation results:
35