amazon
/

MegaBeam-Mistral-7B-300k

@@ -17,7 +17,7 @@ MegaBeam-Mistral-7B-300k is a fine-tuned [Mistral-7B-Instruct-v0.2](https://hugg
 **[InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens](https://github.com/OpenBMB/InfiniteBench)**
-InfiniteBench is a cutting-edge benchmark tailored for evaluating the capabilities of language models to process, understand, and reason over super long contexts (100k+ tokens). We therefore evaluated MegaBeam-Mistral-7B-300k, [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), [Llama-3-8B-Instruct-262k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k), and  [Llama3-70B-1M](https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-1048k) on InfiniteBench. The InfiniteBench authors also evaluated SOTA proprietary and open-source LLMs on InfiniteBench. We thus combined both results in the table below.
 | Task Name        | MegaBeam-Mistral-7B-300k         | Mistral-7B-Instruct-v0.2         | Llama-3-8B-Instruct-262k         | Llama3-70B-1M  | GPT-4-1106-preview  | YaRN-Mistral-7B | Kimi-Chat | Claude 2 | Yi-6B-200K |  Yi-34B-200K |  Chatglm3-6B-128K |
 | ---------------- | ---------------- | ---------------- | ---------------- | ---------------- | ------ | --------------- | --------- | -------- | -----------|  -----------| -----------|
@@ -35,7 +35,7 @@ InfiniteBench is a cutting-edge benchmark tailored for evaluating the capabiliti
 | Math.Find        | 24.28%           | 26.28%           | 15.40%             | 30%             | 60.00% | 17.14%          | 12.57%    | 32.29%   | < 5%       |25.71%       |7.71%       |
 | **Average**      | 30.70%           | 15.08%           | 28.10%             | 31.13%             | 46.08% | 20.41%          | 34.93%    | 37.21%   | 22.78%       |25.41%       |17.59%       |
-The 12 tasks evaluated in the InfiniteBench are summarized below:
 | Task Name            | Context       | # Examples | Avg Input Tokens | Avg Output Tokens | Description                                                                                 |
 | -------------------- | ------------- | ---------- | ---------------- | ----------------- | ------------------------------------------------------------------------------------------- |
 | En.Sum               | Fake Book     | 103        | 171.5k           | 1.1k              | Summarization of a fake book created with core entity substitution.                         |

 **[InfiniteBench: Extending Long Context Evaluation Beyond 100K Tokens](https://github.com/OpenBMB/InfiniteBench)**
+_InfiniteBench is a cutting-edge benchmark tailored for evaluating the capabilities of language models to process, understand, and reason over super long contexts (100k+ tokens)_. We therefore evaluated MegaBeam-Mistral-7B-300k, [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), [Llama-3-8B-Instruct-262k](https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k), and  [Llama3-70B-1M](https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-1048k) on InfiniteBench. The InfiniteBench authors also evaluated SOTA proprietary and open-source LLMs on InfiniteBench. We thus combined both results in the table below.
 | Task Name        | MegaBeam-Mistral-7B-300k         | Mistral-7B-Instruct-v0.2         | Llama-3-8B-Instruct-262k         | Llama3-70B-1M  | GPT-4-1106-preview  | YaRN-Mistral-7B | Kimi-Chat | Claude 2 | Yi-6B-200K |  Yi-34B-200K |  Chatglm3-6B-128K |
 | ---------------- | ---------------- | ---------------- | ---------------- | ---------------- | ------ | --------------- | --------- | -------- | -----------|  -----------| -----------|
 | Math.Find        | 24.28%           | 26.28%           | 15.40%             | 30%             | 60.00% | 17.14%          | 12.57%    | 32.29%   | < 5%       |25.71%       |7.71%       |
 | **Average**      | 30.70%           | 15.08%           | 28.10%             | 31.13%             | 46.08% | 20.41%          | 34.93%    | 37.21%   | 22.78%       |25.41%       |17.59%       |
+The 12 evaluation tasks are summarized below (as per [InfiniteBench]((https://github.com/OpenBMB/InfiniteBench)))
 | Task Name            | Context       | # Examples | Avg Input Tokens | Avg Output Tokens | Description                                                                                 |
 | -------------------- | ------------- | ---------- | ---------------- | ----------------- | ------------------------------------------------------------------------------------------- |
 | En.Sum               | Fake Book     | 103        | 171.5k           | 1.1k              | Summarization of a fake book created with core entity substitution.                         |