Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ MegaBeam-Mistral-7B-300k is a fine-tuned [Mistral-7B-Instruct-v0.2](https://hugg
|
|
17 |
|
18 |
**[InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens](https://github.com/OpenBMB/InfiniteBench)**
|
19 |
|
20 |
-
|
21 |
|
22 |
| Task Name | MegaBeam-Mistral-7B-300k | Mistral-7B-Instruct-v0.2 | Llama-3-8B-Instruct-262k | Llama3-70B-1M | GPT-4-1106-preview | YaRN-Mistral-7B | Kimi-Chat | Claude 2 | Yi-6B-200K | Yi-34B-200K | Chatglm3-6B-128K |
|
23 |
| ---------------- | ---------------- | ---------------- | ---------------- | ---------------- | ------ | --------------- | --------- | -------- | -----------| -----------| -----------|
|
@@ -35,7 +35,7 @@ InfiniteBench is a cutting-edge benchmark tailored for evaluating the capabiliti
|
|
35 |
| Math.Find | 24.28% | 26.28% | 15.40% | 30% | 60.00% | 17.14% | 12.57% | 32.29% | < 5% |25.71% |7.71% |
|
36 |
| **Average** | 30.70% | 15.08% | 28.10% | 31.13% | 46.08% | 20.41% | 34.93% | 37.21% | 22.78% |25.41% |17.59% |
|
37 |
|
38 |
-
The 12 tasks
|
39 |
| Task Name | Context | # Examples | Avg Input Tokens | Avg Output Tokens | Description |
|
40 |
| -------------------- | ------------- | ---------- | ---------------- | ----------------- | ------------------------------------------------------------------------------------------- |
|
41 |
| En.Sum | Fake Book | 103 | 171.5k | 1.1k | Summarization of a fake book created with core entity substitution. |
|
|
|
17 |
|
18 |
**[InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens](https://github.com/OpenBMB/InfiniteBench)**
|
19 |
|
20 |
+
_InfiniteBench is a cutting-edge benchmark tailored for evaluating the capabilities of language models to process, understand, and reason over super long contexts (100k+ tokens)_. We therefore evaluated MegaBeam-Mistral-7B-300k, [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), [Llama-3-8B-Instruct-262k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k), and [Llama3-70B-1M](https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-1048k) on InfiniteBench. The InfiniteBench authors also evaluated SOTA proprietary and open-source LLMs on InfiniteBench. We thus combined both results in the table below.
|
21 |
|
22 |
| Task Name | MegaBeam-Mistral-7B-300k | Mistral-7B-Instruct-v0.2 | Llama-3-8B-Instruct-262k | Llama3-70B-1M | GPT-4-1106-preview | YaRN-Mistral-7B | Kimi-Chat | Claude 2 | Yi-6B-200K | Yi-34B-200K | Chatglm3-6B-128K |
|
23 |
| ---------------- | ---------------- | ---------------- | ---------------- | ---------------- | ------ | --------------- | --------- | -------- | -----------| -----------| -----------|
|
|
|
35 |
| Math.Find | 24.28% | 26.28% | 15.40% | 30% | 60.00% | 17.14% | 12.57% | 32.29% | < 5% |25.71% |7.71% |
|
36 |
| **Average** | 30.70% | 15.08% | 28.10% | 31.13% | 46.08% | 20.41% | 34.93% | 37.21% | 22.78% |25.41% |17.59% |
|
37 |
|
38 |
+
The 12 evaluation tasks are summarized below (as per [InfiniteBench]((https://github.com/OpenBMB/InfiniteBench)))
|
39 |
| Task Name | Context | # Examples | Avg Input Tokens | Avg Output Tokens | Description |
|
40 |
| -------------------- | ------------- | ---------- | ---------------- | ----------------- | ------------------------------------------------------------------------------------------- |
|
41 |
| En.Sum | Fake Book | 103 | 171.5k | 1.1k | Summarization of a fake book created with core entity substitution. |
|