chenwuml commited on
Commit
1f99523
·
verified ·
1 Parent(s): ba4aa71

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -17,7 +17,7 @@ MegaBeam-Mistral-7B-300k is a fine-tuned [Mistral-7B-Instruct-v0.2](https://hugg
17
 
18
  **[InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens](https://github.com/OpenBMB/InfiniteBench)**
19
 
20
- InfiniteBench is a cutting-edge benchmark tailored for evaluating the capabilities of language models to process, understand, and reason over super long contexts (100k+ tokens). We therefore evaluated MegaBeam-Mistral-7B-300k, [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), [Llama-3-8B-Instruct-262k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k), and [Llama3-70B-1M](https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-1048k) on InfiniteBench. The InfiniteBench authors also evaluated SOTA proprietary and open-source LLMs on InfiniteBench. We thus combined both results in the table below.
21
 
22
  | Task Name | MegaBeam-Mistral-7B-300k | Mistral-7B-Instruct-v0.2 | Llama-3-8B-Instruct-262k | Llama3-70B-1M | GPT-4-1106-preview | YaRN-Mistral-7B | Kimi-Chat | Claude 2 | Yi-6B-200K | Yi-34B-200K | Chatglm3-6B-128K |
23
  | ---------------- | ---------------- | ---------------- | ---------------- | ---------------- | ------ | --------------- | --------- | -------- | -----------| -----------| -----------|
@@ -35,7 +35,7 @@ InfiniteBench is a cutting-edge benchmark tailored for evaluating the capabiliti
35
  | Math.Find | 24.28% | 26.28% | 15.40% | 30% | 60.00% | 17.14% | 12.57% | 32.29% | < 5% |25.71% |7.71% |
36
  | **Average** | 30.70% | 15.08% | 28.10% | 31.13% | 46.08% | 20.41% | 34.93% | 37.21% | 22.78% |25.41% |17.59% |
37
 
38
- The 12 tasks evaluated in the InfiniteBench are summarized below:
39
  | Task Name | Context | # Examples | Avg Input Tokens | Avg Output Tokens | Description |
40
  | -------------------- | ------------- | ---------- | ---------------- | ----------------- | ------------------------------------------------------------------------------------------- |
41
  | En.Sum | Fake Book | 103 | 171.5k | 1.1k | Summarization of a fake book created with core entity substitution. |
 
17
 
18
  **[InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens](https://github.com/OpenBMB/InfiniteBench)**
19
 
20
+ _InfiniteBench is a cutting-edge benchmark tailored for evaluating the capabilities of language models to process, understand, and reason over super long contexts (100k+ tokens)_. We therefore evaluated MegaBeam-Mistral-7B-300k, [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), [Llama-3-8B-Instruct-262k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k), and [Llama3-70B-1M](https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-1048k) on InfiniteBench. The InfiniteBench authors also evaluated SOTA proprietary and open-source LLMs on InfiniteBench. We thus combined both results in the table below.
21
 
22
  | Task Name | MegaBeam-Mistral-7B-300k | Mistral-7B-Instruct-v0.2 | Llama-3-8B-Instruct-262k | Llama3-70B-1M | GPT-4-1106-preview | YaRN-Mistral-7B | Kimi-Chat | Claude 2 | Yi-6B-200K | Yi-34B-200K | Chatglm3-6B-128K |
23
  | ---------------- | ---------------- | ---------------- | ---------------- | ---------------- | ------ | --------------- | --------- | -------- | -----------| -----------| -----------|
 
35
  | Math.Find | 24.28% | 26.28% | 15.40% | 30% | 60.00% | 17.14% | 12.57% | 32.29% | < 5% |25.71% |7.71% |
36
  | **Average** | 30.70% | 15.08% | 28.10% | 31.13% | 46.08% | 20.41% | 34.93% | 37.21% | 22.78% |25.41% |17.59% |
37
 
38
+ The 12 evaluation tasks are summarized below (as per [InfiniteBench]((https://github.com/OpenBMB/InfiniteBench)))
39
  | Task Name | Context | # Examples | Avg Input Tokens | Avg Output Tokens | Description |
40
  | -------------------- | ------------- | ---------- | ---------------- | ----------------- | ------------------------------------------------------------------------------------------- |
41
  | En.Sum | Fake Book | 103 | 171.5k | 1.1k | Summarization of a fake book created with core entity substitution. |