fit in 24gb?

#1
by apol - opened

Any plans to release a 2b or 1b version that is functional and can fit into a 4090?
Congrats on the release and bravo!

A 4090 has 24GB of VRAM. Assuming want to offload the whole model to GPU, you can fit an IQ2_XS GGUF or a 2.24bpw EXL2 of this model into your 4090 with the 8K native context of Llama 3.

If you only want to run full precision models, you can easily fit a Llama 3 8B model into your VRAM. No need to go all the way down to 2B.

thanks!

apol changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment