Update README.md
Browse files
README.md
CHANGED
@@ -6,35 +6,24 @@ tags:
|
|
6 |
datasets:
|
7 |
- arcee-ai/sec-data-mini
|
8 |
---
|
|
|
9 |
|
10 |
-
|
11 |
-
|
12 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
13 |
-
|
14 |
-
|
15 |
-
|
16 |
-
## Model Details
|
17 |
|
18 |
### Model Description
|
19 |
|
20 |
-
|
21 |
-
|
22 |
-
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
23 |
-
|
24 |
-
- **Developed by:** Arcee-ai
|
25 |
-
- **Create from model :** mistralai/Mistral-7B-Instruct-v0.2
|
26 |
|
27 |
### Model Sources
|
28 |
|
29 |
-
|
30 |
-
|
31 |
-
- **Repository:** [https://github.com/arcee-ai/
|
32 |
-
- **Paper :** [https://arxiv.org/pdf/2403.17887.pdf]
|
33 |
|
34 |
## Uses
|
35 |
|
36 |
-
|
37 |
|
38 |
### Downstream Use
|
39 |
|
40 |
-
|
|
|
6 |
datasets:
|
7 |
- arcee-ai/sec-data-mini
|
8 |
---
|
9 |
+
## Quick Summary
|
10 |
|
11 |
+
This model is an adaptation of the `mistralai/Mistral-7B-Instruct-v0.2`, refined through the application of layer pruning techniques as detailed in the paper "The Unreasonable Ineffectiveness of the Deeper Layers." It incorporates methodologies from the `MergeKit` and `PruneMe` repositories to optimize its structure, focusing on reducing redundancy within the model's deeper layers without compromising its ability to generate coherent text. The model is maintained by Arcee-ai and represents a practical implementation of computational efficiency improvements in Large Language Models (LLMs), aiming to balance performance with resource usage effectively.
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
|
13 |
### Model Description
|
14 |
|
15 |
+
This model represents a specialized iteration of the `mistralai/Mistral-7B-Instruct-v0.2`, optimized for efficiency and performance through selective layer pruning. Developed by Arcee-ai, it leverages insights from the "The Unreasonable Ineffectiveness of the Deeper Layers" research. The pruning process was informed by the `MergeKit` and `PruneMe` tools, focusing on eliminating redundant layers to ensure a leaner, more efficient model capable of generating high-quality text outputs.
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
### Model Sources
|
18 |
|
19 |
+
- **Pruning:** [PruneMe GitHub (unofficial)](https://github.com/arcee-ai/PruneMe)
|
20 |
+
- **Paper:** ["The Unreasonable Ineffectiveness of the Deeper Layers"](https://arxiv.org/pdf/2403.17887.pdf)
|
21 |
+
- **Merging Repository:** [MergeKit GitHub](https://github.com/arcee-ai/mergekit)
|
|
|
22 |
|
23 |
## Uses
|
24 |
|
25 |
+
This pruned model is designed for a range of NLP tasks, with a focus on maintaining or even enhancing the model's original capabilities in generating coherent text, despite the reduction in its size. It stands as a testament to the feasibility of layer pruning in preserving the essential functional attributes of a model while offering a template for computational resource optimization.
|
26 |
|
27 |
### Downstream Use
|
28 |
|
29 |
+
The pruned model serves as a robust foundation for fine-tuning on specific tasks and is an ideal candidate for exploring continuous pre-training opportunities. Its development is a direct application of principles outlined in "The Unreasonable Ineffectiveness of the Deeper Layers," utilizing the `MergeKit` and `PruneMe` repositories for practical pruning implementation. This model is a step forward in efficient model design, demonstrating the potential for significant reductions in computational resource requirements without detrimental effects on performance.
|