katanemo
/

Arch-Guard

Text Classification

Model card Files Files and versions Community

parachas commited on May 6

Commit

b4118ac

·

verified ·

1 Parent(s): 8169a9e

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ The Katanemo Arch-Guard collection is a collection state-of-the-art (SOTA) LLMs
 Definition: jailbreaking attempts are malicious prompts designed to alternate the intended behavior of the foundation LLM model of the application. They often violate the safety and security policies of the model.
 Arch Guard is a classifier model fine-tuned based on the open source model [Prompt-Guard-86M](https://huggingface.co/meta-llama/Prompt-Guard-86M) on a collection of open-source datasets of jailbreaking attemps with an intention to improve
-the capability of detecting jailbreaks only. This model is used in [Arch, the AI-native proxy server for agents](https://github.com/katanemo/archgw)
 In summary, the Katanemo Arch-Guard collection demonstrates:
 - **State-of-the-art performance** in jailbreaking attempts detection

 Definition: jailbreaking attempts are malicious prompts designed to alternate the intended behavior of the foundation LLM model of the application. They often violate the safety and security policies of the model.
 Arch Guard is a classifier model fine-tuned based on the open source model [Prompt-Guard-86M](https://huggingface.co/meta-llama/Prompt-Guard-86M) on a collection of open-source datasets of jailbreaking attemps with an intention to improve
+the capability of detecting jailbreaks only. This model is used in [Arch](https://github.com/katanemo/archgw) - the AI-native proxy server for agents
 In summary, the Katanemo Arch-Guard collection demonstrates:
 - **State-of-the-art performance** in jailbreaking attempts detection