Zyphra
/

Zamba2-7B-Instruct

@@ -62,17 +62,24 @@ TODO
 Moreover, due to its unique hybrid SSM architecture, Zamba2-7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
 <center>
-<img src="https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/WKTcYkhDgJCHyze4TDpLa.png" width="700" alt="Zamba performance">
 </center>
 Time to First Token (TTFT)             |  Output Generation
 :-------------------------:|:-------------------------:
-![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/BmE8X6tDNVw5OJcbZt8sZ.png)  |  ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/wECc9cItK1FW1MOMGSLrp.png)
 <center>
-<img src="https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/nhoss41xlzfEBZzcQXI6z.png" width="700" alt="Zamba inference and memory cost">
 </center>
 Zamba2-7B-Instruct's high performance, strong instruction-following and reasoning capabilities for its size makes it an ideal generalist small model for a wide range of applications.
@@ -91,10 +98,9 @@ Our Zamba2-7B instruct features an experimental long-context mode which extends
 In Needle-In-A-Haystack tests, we can observe that Zamba2-7B-Instruct can find the needle with an extremely high success rate up to and slightly beyond 16k context with performance falling off sharply at about 18k context. In future versions we aim to extend this context length significantly.
-![image/png]()
 <center>
-<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/uP1sbXqF1mZAp1R6NYdCF.png" width="300" alt="Zamba long context performance">
 </center>

 Moreover, due to its unique hybrid SSM architecture, Zamba2-7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
 <center>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/nHM8bX0y8SWa4zwMSbBi7.png" width="500" alt="Zamba architecture">
+</center>
+<center>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/qXG8aip6h77LHKjhWfjD5.png" width="500" alt="Zamba architecture">
 </center>
 Time to First Token (TTFT)             |  Output Generation
 :-------------------------:|:-------------------------:
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/OQ0lSMe9ltfgfHGOhyIIp.png)  |  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/7LwzRc7X5DG0HX28icJjJ.png)
+And memory overhead
 <center>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/Pgp6o7PFXZVLCY9HQiWYC.png" width="400" alt="Zamba inference and memory cost">
 </center>
 Zamba2-7B-Instruct's high performance, strong instruction-following and reasoning capabilities for its size makes it an ideal generalist small model for a wide range of applications.
 In Needle-In-A-Haystack tests, we can observe that Zamba2-7B-Instruct can find the needle with an extremely high success rate up to and slightly beyond 16k context with performance falling off sharply at about 18k context. In future versions we aim to extend this context length significantly.
 <center>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/uP1sbXqF1mZAp1R6NYdCF.png" width="500" alt="Zamba long context performance">
 </center>