The model has tendency to hallucinate for long time
#1
by
Cleanable
- opened
Hello,
I am using the fp16 under koboldcpp.
Sometimes, after answering a question, it starts printing codes that has nothing to do with the conversation and it does it for very long time.
What could be wrong please?
Thank you
I have tested this model intensely. The quantized version is made with llama.cpp not sure the the inference code can cause something like that. I’d try it with llama.cpp and maybe look the Q8 and Q6 versions too