GGUF
Not-For-All-Audiences
Inference Endpoints

Chat template

#5
by GiuWalker - opened

Hi,
I am very new in this and I just downloaded Moistral to use under GPT4All. But it says I need a chat template. Is there one already done? Any tips on where to find one that might fit would be very welcome!

Don't use gpt4all, it's very old.

If you have GTX/amd series gpu or want to do inference on CPU/ with offloading (models don't fully fit in your gpu VRAM), use:

  • koboldcpp (easiest)
  • text-generation-webui
  • lmstudio (not open source)

If you have rtx series gpu and model will fully fit in VRAM use exl2 quants (exllamav2) instead of gguf with

-tabbyapi
-text generation wenui

To get faster prompt processing and parallelism

Sign up or log in to comment