The model has tendency to hallucinate for long time

#1
by Cleanable - opened

Hello,
I am using the fp16 under koboldcpp.
Sometimes, after answering a question, it starts printing codes that has nothing to do with the conversation and it does it for very long time.
What could be wrong please?
Thank you

DevQuasar org
edited Oct 2

I have tested this model intensely. The quantized version is made with llama.cpp not sure the the inference code can cause something like that. I’d try it with llama.cpp and maybe look the Q8 and Q6 versions too

Sign up or log in to comment