BerenMillidge commited on
Commit
107269d
·
verified ·
1 Parent(s): 5fd6c54

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -5
README.md CHANGED
@@ -62,17 +62,24 @@ TODO
62
  Moreover, due to its unique hybrid SSM architecture, Zamba2-7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
63
 
64
  <center>
65
- <img src="https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/WKTcYkhDgJCHyze4TDpLa.png" width="700" alt="Zamba performance">
 
 
 
 
66
  </center>
67
 
68
 
 
69
  Time to First Token (TTFT) | Output Generation
70
  :-------------------------:|:-------------------------:
71
- ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/BmE8X6tDNVw5OJcbZt8sZ.png) | ![](https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/wECc9cItK1FW1MOMGSLrp.png)
 
72
 
 
73
 
74
  <center>
75
- <img src="https://cdn-uploads.huggingface.co/production/uploads/65bc13717c6ad1994b6619e9/nhoss41xlzfEBZzcQXI6z.png" width="700" alt="Zamba inference and memory cost">
76
  </center>
77
 
78
  Zamba2-7B-Instruct's high performance, strong instruction-following and reasoning capabilities for its size makes it an ideal generalist small model for a wide range of applications.
@@ -91,10 +98,9 @@ Our Zamba2-7B instruct features an experimental long-context mode which extends
91
 
92
  In Needle-In-A-Haystack tests, we can observe that Zamba2-7B-Instruct can find the needle with an extremely high success rate up to and slightly beyond 16k context with performance falling off sharply at about 18k context. In future versions we aim to extend this context length significantly.
93
 
94
- ![image/png]()
95
 
96
  <center>
97
- <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/uP1sbXqF1mZAp1R6NYdCF.png" width="300" alt="Zamba long context performance">
98
  </center>
99
 
100
 
 
62
  Moreover, due to its unique hybrid SSM architecture, Zamba2-7B-Instruct achieves extremely low inference latency and rapid generation with a significantly smaller memory footprint than comparable transformer-based models.
63
 
64
  <center>
65
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/nHM8bX0y8SWa4zwMSbBi7.png" width="500" alt="Zamba architecture">
66
+ </center>
67
+
68
+ <center>
69
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/qXG8aip6h77LHKjhWfjD5.png" width="500" alt="Zamba architecture">
70
  </center>
71
 
72
 
73
+
74
  Time to First Token (TTFT) | Output Generation
75
  :-------------------------:|:-------------------------:
76
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/OQ0lSMe9ltfgfHGOhyIIp.png) | ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/7LwzRc7X5DG0HX28icJjJ.png)
77
+
78
 
79
+ And memory overhead
80
 
81
  <center>
82
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/Pgp6o7PFXZVLCY9HQiWYC.png" width="400" alt="Zamba inference and memory cost">
83
  </center>
84
 
85
  Zamba2-7B-Instruct's high performance, strong instruction-following and reasoning capabilities for its size makes it an ideal generalist small model for a wide range of applications.
 
98
 
99
  In Needle-In-A-Haystack tests, we can observe that Zamba2-7B-Instruct can find the needle with an extremely high success rate up to and slightly beyond 16k context with performance falling off sharply at about 18k context. In future versions we aim to extend this context length significantly.
100
 
 
101
 
102
  <center>
103
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/uP1sbXqF1mZAp1R6NYdCF.png" width="500" alt="Zamba long context performance">
104
  </center>
105
 
106