Update README.md
Browse files
README.md
CHANGED
@@ -21,11 +21,11 @@ datasets:
|
|
21 |
# katanemo/Arch-Guard-gpu
|
22 |
|
23 |
## Overview
|
24 |
-
The Katanemo Arch-Guard collection is a collection state-of-the-art (SOTA) LLMs specifically designed for **jailbreaking detection** tasks.
|
25 |
Definition: jailbreaking attempts are malicious prompts designed to alternate the intended behavior of the foundation LLM model of the application. They often violate the safety and security policies of the model.
|
26 |
|
27 |
Arch Guard is a classifier model fine-tuned based on the open source model [Prompt-Guard-86M](https://huggingface.co/meta-llama/Prompt-Guard-86M) on a collection of open-source datasets of jailbreaking attemps with an intention to improve
|
28 |
-
the capability of detecting jailbreaks only.
|
29 |
|
30 |
In summary, the Katanemo Arch-Guard collection demonstrates:
|
31 |
- **State-of-the-art performance** in jailbreaking attempts detection
|
|
|
21 |
# katanemo/Arch-Guard-gpu
|
22 |
|
23 |
## Overview
|
24 |
+
The Katanemo Arch-Guard collection is a collection state-of-the-art (SOTA) LLMs specifically designed for **jailbreaking detection** tasks.
|
25 |
Definition: jailbreaking attempts are malicious prompts designed to alternate the intended behavior of the foundation LLM model of the application. They often violate the safety and security policies of the model.
|
26 |
|
27 |
Arch Guard is a classifier model fine-tuned based on the open source model [Prompt-Guard-86M](https://huggingface.co/meta-llama/Prompt-Guard-86M) on a collection of open-source datasets of jailbreaking attemps with an intention to improve
|
28 |
+
the capability of detecting jailbreaks only. This model is used in [Arch, the AI-native proxy server for agents](https://github.com/katanemo/archgw)
|
29 |
|
30 |
In summary, the Katanemo Arch-Guard collection demonstrates:
|
31 |
- **State-of-the-art performance** in jailbreaking attempts detection
|