Run the model, it return: safe for any inputs

#1
by TomTang - opened

(base) C:\Windows\system32>ollama run hf.co/mradermacher/oh-dcft-v3.1-claude-3-5-sonnet-20241022-GGUF:Q8_0

Hi
safe

Can anyone help me with this error?

Hmm, seems to work for me with llama.cpp. I don't know enough about ollama, but it looks more a problem with either ollama or your set-up, and not something specific to this model, which suggests you should ask in an ollama forum.

> What could cause ollama to only generate "safe" with an llm for any input? 
That's an interesting question about LLaMA, but I aim to be helpful while respecting copyright. I can discuss this topic in general terms - it appears you're asking about potential causes for LLaMA outputs being limited or filtered. I'd encourage you to share more specific details about what you're observing or what you'd like to learn more about.
mradermacher changed discussion status to closed

Hi there,
Thanks for your rapid response. I have tried to figure this issue out these few days. I still don't know why this happens, however, I manage to get around it through SillyTavern. The interesting facts is, in either the Powershell or ChatBox, the model returns "Safe" (just one word); in SillyTavern, the model returns complete answers. I am relatively happy with the answer it generates. Great job. Thanks again.

That must have some fascinating explanation somewhere :)

Hi there,

I hope this message finds you well. After further investigation, I am writing to provide feedback regarding this model and to seek guidance on an issue I have encountered with its safety evaluation mechanism, which is currently preventing the model from generating conversational outputs as expected.

The models involved are:
hf.co/mradermacher/oh-dcft-v3.1-claude-3-5-haiku-20241022-GGUF:Q6_K e02d7b447a57 6.6 GB
hf.co/mradermacher/oh-dcft-v3.1-claude-3-5-sonnet-20241022-GGUF:Q5_K_M bd4302848bab 5.7 GB

Issue Description:
While testing the model via Ollama, I noticed that regardless of the input prompt, the model only returns either “safe” or “unsafe.” After further investigation, I discovered that the model’s Modelfile includes a safety evaluation template designed to check the last user message against a predefined list of unsafe content categories. Additionally, the model parameter for temperature is set to 0, which appears to reinforce a deterministic safety classification rather than allowing for natural language generation.

Modelfile Template Details:

Below is the specific content of the Modelfile template that I retrieved:

makefile
Copy
{{- $lastIndex := (len (slice .Messages 1)) }}
{{- $last := index .Messages $lastIndex }}
{{- $role := ""}}
{{- if eq $last.Role "user" }}
{{- $role = "User" }}
{{- else }}
{{- $role = "Agent" }}
{{- end }}<|start_header_id|>user<|end_header_id|>

Task: Check if there is unsafe content in '{{ $role }}' messages in conversations according our safety policy with the below categories.

S1: Violent Crimes. S2: Non-Violent Crimes. S3: Sex Crimes. S4: Child Exploitation. S5: Defamation. S6: Specialized Advice. S7: Privacy. S8: Intellectual Property. S9: Indiscriminate Weapons. S10: Hate. S11: Self-Harm. S12: Sexual Content. S13: Elections. S14: Code Interpreter Abuse.

{{ range .Messages }}
{{- if eq .Role "user" }}User: {{ .Content }}

{{ else }}Agent: {{ .Content }}

{{ end }}
{{- end }}

Provide your safety assessment for ONLY THE LAST User message in the above conversation:
- First line must read 'safe' or 'unsafe'.
- If unsafe, a second line must include a comma-separated list of violated categories.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

/show info
Model
architecture llama
parameters 8.0B
context length 131072
embedding length 4096
quantization unknown

Parameters
temperature 0

As you can see, the template instructs the model to evaluate the last user message for unsafe content based on a list of 14 categories, and then return only “safe” or “unsafe” (with additional details if unsafe). This clearly indicates that the model is currently configured solely as a safety classifier rather than a conversational agent.

Request for Guidance:
Given the above, I have the following questions:

  1. Is there an alternative version of the model that does not incorporate this safety evaluation template, which would allow for full conversational outputs?
  2. Alternatively, could you please advise on how to disable or modify the safety evaluation mechanism (e.g., by altering the Modelfile) so that the model can generate regular dialogue responses?
  3. As I mentioned in the last message, I managed to get around it through SillyTavern. The interesting fact is, in either the Powershell or ChatBox, the model returns "Safe" (just one word); in SillyTavern, the model returns complete answers. Out of curiosity, how did this happen? It seems that SillyTavern did something to the model that I don't understand—how come?

I appreciate the work that has gone into developing this model and understand that safety is a critical aspect. However, for my intended use case, a conversational output is essential, and I would be grateful for any guidance or recommendations you could provide.

Thank you very much for your time and assistance. I look forward to your response.

Best regards,

Wow, I had nod idea!

Since llama.cpp doesn't enforce any templates (And in fact, when trying it out, didn't use this by default), I would assume this must be some kind of setting in ollama, or, indeed, the client can provide a different template.

It would be certainly possible to modify the gguf file, but I am not aware of a tool to do so (the one in llama.cpp doesn't seem to be able top change strings in dramatic way, but might be worth a try).

SillyTavern

SillyTavern probably provides it's own system prompt, which is likely all that is required.

Hi mradermacher,

I believe the key is at the modelfile. Not sure if this modelfile comes with the model itself or the Ollama. It is likely to come with the model itself, as this message of the modelfile is generated by:

/show modelfile

Modelfile generated by "ollama show"

To build a new Modelfile based on this, replace FROM with:

FROM hf.co/mradermacher/oh-dcft-v3.1-claude-3-5-haiku-20241022-GGUF:Q6_K

FROM E:\ollama\models\blobs\sha256-f9d06ca2e6e4e0890ab0a60b1a26c60e66cceb564bed0dd60b2fbee14a37ce27
TEMPLATE """{{- $lastIndex := (len (slice .Messages 1)) }}
{{- $last := index .Messages $lastIndex }}
{{- $role := ""}}
{{- if eq $last.Role "user" }}
{{- $role = "User" }}
{{- else }}
{{- $role = "Agent" }}
{{- end }}<|start_header_id|>user<|end_header_id|>

Task: Check if there is unsafe content in '{{ $role }}' messages in conversations according our safety policy with the below categories.

S1: Violent Crimes. S2: Non-Violent Crimes. S3: Sex Crimes. S4: Child Exploitation. S5: Defamation. S6: Specialized Advice. S7: Privacy. S8: Intellectual Property. S9: Indiscriminate Weapons. S10: Hate. S11: Self-Harm. S12: Sexual Content. S13: Elections. S14: Code Interpreter Abuse.

{{ range .Messages }}
{{- if eq .Role "user" }}User: {{ .Content }}

{{ else }}Agent: {{ .Content }}

{{ end }}
{{- end }}

Provide your safety assessment for ONLY THE LAST User message in the above conversation:

  • First line must read 'safe' or 'unsafe'.
  • If unsafe, a second line must include a comma-separated list of violated categories.<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    """
    PARAMETER temperature 0

/show parameters
Model defined parameters:
temperature 0

This output implies that the modelfile is generated by the model. Hope this make sense.
Obviously, SillyTavern did some great system prompts to write over the modelfile I find in the model, therefore through SillyTavern, this model works.

Regards,

the model file is, as i understand it, simply a config file for ollama. it does allow overwriting some of the parameters in the gguf, which are basically just defaults.

For the record, the chat template in these gguf is not the source for the ollamas prompt (which explains why llama.cpp does not output Safe). I also can't imagine ollama would supply this on its own, so I think it must come from somewhere else (such as a config file on your system or similar), and therefore I don't think other people will run into this issue.

The chat template from the model is below:

{{ '<|begin_of_text|>' }}{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% endif %}{% if system_message is defined %}{{ '<|start_header_id|>system<|end_header_id|> ' + system_message + '<|eot_id|>' }}{% endif %}{% for message in loop_messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|start_header_id|>user<|end_header_id|> ' + content + '<|eot_id|><|start_header_id|>assistant<|end_header_id|> ' }}{% elif message['role'] == 'assistant' %}{{ content + '<|eot_id|>' }}{% endif %}{% endfor %}

Hi mradermacher,
After create a new model with a regular chat template, this model works fine now. Thanks for your time.

ollama create sonnet-20241022-GGUF:Q5_K_M -f E:\ollama\models\custom-modelfile.txt

FROM E:\ollama\models\blobs\sha256-f9d06ca2e6e4e0890ab0a60b1a26c60e66cceb564bed0dd60b2fbee14a37ce27

TEMPLATE """
{{ if eq (index .Messages 0).Role "system" }}
{{ "<|start_header_id|>system<|end_header_id|>" }} {{ (index .Messages 0).Content }} {{ "<|eot_id|>" }}
{{ end }}
{{ range .Messages }}
{{ if eq .Role "user" }}
{{ "<|start_header_id|>user<|end_header_id|>" }} {{ .Content }} {{ "<|eot_id|>" }}
{{ "<|start_header_id|>assistant<|end_header_id|>" }}
{{ else if eq .Role "assistant" }}
{{ .Content }} {{ "<|eot_id|>" }}
{{ end }}
{{ end }}
"""

PARAMETER temperature 0.7
PARAMETER num_predict 2048
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|end_header_id|>"

Well, it's a bit worrying that you apparently get some kind of "safe" template from somewhere else (and interesting template btw.). But the important point is that you got the model to work, so all the power to you :)

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment