Use ybelkada/Mixtral-8x7B-Instruct-v0.1-AWQ with VLLM instead
1
#10 opened 11 months ago
by
blobpenguin

Inference taking too much time
3
#9 opened about 1 year ago
by
tariksetia
Update README.md
#8 opened about 1 year ago
by
skoita
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
2
#7 opened about 1 year ago
by
aaganaie
TGI - response is an empty string
2
#6 opened about 1 year ago
by
p-christ
OC is not a multiple of cta_N = 64
2
#5 opened about 1 year ago
by
lazyDataScientist

Not supporting with TGI
1
#4 opened about 1 year ago
by
abhishek3jangid
always getting 0 in output
15
#3 opened about 1 year ago
by
xubuild
OOM under vLLM even with 80GB GPU
5
#2 opened about 1 year ago
by
mike-ravkine
Not supported for TGI > 1.3 ?
20
#1 opened about 1 year ago
by
paulcx