Update README.md
Browse files
README.md
CHANGED
@@ -62,17 +62,24 @@ TODO
|
|
62 |
Moreover, due to its unique hybrid SSM architecture, Zamba2-7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
|
63 |
|
64 |
<center>
|
65 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/
|
|
|
|
|
|
|
|
|
66 |
</center>
|
67 |
|
68 |
|
|
|
69 |
Time to First Token (TTFT) | Output Generation
|
70 |
:-------------------------:|:-------------------------:
|
71 |
-

|
95 |
|
96 |
<center>
|
97 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/uP1sbXqF1mZAp1R6NYdCF.png" width="
|
98 |
</center>
|
99 |
|
100 |
|
|
|
62 |
Moreover, due to its unique hybrid SSM architecture, Zamba2-7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
|
63 |
|
64 |
<center>
|
65 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/nHM8bX0y8SWa4zwMSbBi7.png" width="500" alt="Zamba architecture">
|
66 |
+
</center>
|
67 |
+
|
68 |
+
<center>
|
69 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/qXG8aip6h77LHKjhWfjD5.png" width="500" alt="Zamba architecture">
|
70 |
</center>
|
71 |
|
72 |
|
73 |
+
|
74 |
Time to First Token (TTFT) | Output Generation
|
75 |
:-------------------------:|:-------------------------:
|
76 |
+
 | 
|
77 |
+
|
78 |
|
79 |
+
And memory overhead
|
80 |
|
81 |
<center>
|
82 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/Pgp6o7PFXZVLCY9HQiWYC.png" width="400" alt="Zamba inference and memory cost">
|
83 |
</center>
|
84 |
|
85 |
Zamba2-7B-Instruct's high performance, strong instruction-following and reasoning capabilities for its size makes it an ideal generalist small model for a wide range of applications.
|
|
|
98 |
|
99 |
In Needle-In-A-Haystack tests, we can observe that Zamba2-7B-Instruct can find the needle with an extremely high success rate up to and slightly beyond 16k context with performance falling off sharply at about 18k context. In future versions we aim to extend this context length significantly.
|
100 |
|
|
|
101 |
|
102 |
<center>
|
103 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/uP1sbXqF1mZAp1R6NYdCF.png" width="500" alt="Zamba long context performance">
|
104 |
</center>
|
105 |
|
106 |
|