Marlin kernel in vLLM - new checkpoint?
#10 opened 7 months ago
by
zoltan-fedor
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1642796523564-61db487a122ca5b89e751262.jpeg)
Based on llama-2?
1
#9 opened 8 months ago
by
rdewolff
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6623b7383ad8557caaa45641/kuHPB1ywU9UXV0TuFQRJe.png)
[AUTOMATED] Model Memory Requirements
#8 opened 8 months ago
by
muellerzr
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1594174616206-5f05297d5d08220171a0ad7d.png)
How to setup the generation_config properly?
#7 opened 9 months ago
by
KIlian42
The inference API is too slow.
1
#6 opened 9 months ago
by
YernazarBis
How did you create AWQ-quantized weights?
4
#5 opened 10 months ago
by
nightdude
![](https://cdn-avatars.huggingface.co/v1/production/uploads/64b16572727711e4ea92ef67/N2pNQOUMDP8jfyAQktH8v.png)
encountered error when loading model
7
#4 opened 10 months ago
by
zhouzr