Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,7 @@ The Katanemo Arch-Guard collection is a collection state-of-the-art (SOTA) LLMs
|
|
25 |
Definition: jailbreaking attempts are malicious prompts designed to alternate the intended behavior of the foundation LLM model of the application. They often violate the safety and security policies of the model.
|
26 |
|
27 |
Arch Guard is a classifier model fine-tuned based on the open source model [Prompt-Guard-86M](https://huggingface.co/meta-llama/Prompt-Guard-86M) on a collection of open-source datasets of jailbreaking attemps with an intention to improve
|
28 |
-
the capability of detecting jailbreaks only. This model is used in [Arch
|
29 |
|
30 |
In summary, the Katanemo Arch-Guard collection demonstrates:
|
31 |
- **State-of-the-art performance** in jailbreaking attempts detection
|
|
|
25 |
Definition: jailbreaking attempts are malicious prompts designed to alternate the intended behavior of the foundation LLM model of the application. They often violate the safety and security policies of the model.
|
26 |
|
27 |
Arch Guard is a classifier model fine-tuned based on the open source model [Prompt-Guard-86M](https://huggingface.co/meta-llama/Prompt-Guard-86M) on a collection of open-source datasets of jailbreaking attemps with an intention to improve
|
28 |
+
the capability of detecting jailbreaks only. This model is used in [Arch](https://github.com/katanemo/archgw) - the AI-native proxy server for agents
|
29 |
|
30 |
In summary, the Katanemo Arch-Guard collection demonstrates:
|
31 |
- **State-of-the-art performance** in jailbreaking attempts detection
|