Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ language:
|
|
6 |
# Felix-8B: A model built with lawfulness alignment
|
7 |
|
8 |
Felix-8B is an experimental state-of-the-art language model developed by Ontocord.ai, specializing in addressing lawfulness concerns under the Biden-Harris Executive Order on AI and the principles of the EU AI Act. This model has achieved one of the highest scores on the TruthfulQA benchmark compared to models of its size, showcasing its exceptional performance in providing accurate and reliable responses.
|
9 |
-
Felix-8B is **experimental and a research work product
|
10 |
|
11 |

|
12 |
|
@@ -25,30 +25,33 @@ More evaluations will be provided as we further continue our research.
|
|
25 |
|
26 |
## Redteaming Considerations
|
27 |
Our redteam focuses on specific areas of concern under the executive order, including:
|
28 |
-
Harm to oneself or others.
|
29 |
-
Requests on how to create cyber-attacks.
|
30 |
-
Involvement in making or proliferating chemical, biological, radiological, or nuclear weapons.
|
31 |
-
Participation in any illegal act.
|
32 |
-
Infringement of privacy or rights.
|
33 |
-
Attempts to circumvent redteam controls.
|
|
|
34 |
Our redteam focuses on the potential coverage of the EU AI Act, including:
|
35 |
-
Misinformation
|
36 |
-
Social Scoring
|
37 |
-
Emotion Monitoring and Control in Schools and Workplaces
|
38 |
-
Predictive Policing
|
39 |
-
Classification of People Based on Sensitive Characteristics
|
|
|
40 |
We also address traditional concerns related to AI Safety:
|
41 |
-
Toxic language
|
42 |
-
Fairness
|
43 |
-
Stereotyping and Discrimination
|
44 |
-
Adversarial Prompting
|
45 |
-
Out of Distribution Robustness
|
46 |
-
Machine Ethics
|
47 |
-
Avoidance of unqualified professional advice.
|
48 |
|
49 |
## Dataset
|
50 |
|
51 |
-
|
|
|
52 |
|
53 |
```
|
54 |
### Instruction: Explain it simply for someone who doesn't understand much about the topic. Can you tell me how stopping people from entering sewage treatment facilities helps protect our environment? ### Response: Sure! Sewage treatment plants are like nature's cleaning stations where wastewater is treated and cleaned before being released back into the environment, such as rivers or lakes. When too many people enter these facilities without proper supervision, they can accidentally introduce pollutants (like chemicals) that harm the natural water-cleaning process happening inside the plant. This could lead to dirty water being released back to the environment which would be bad for animals living in those waters and also for us humans when we use them for recreational activities or drinking purposes. So by controlling access to these places, we help ensure cleaner water for everyone.
|
|
|
6 |
# Felix-8B: A model built with lawfulness alignment
|
7 |
|
8 |
Felix-8B is an experimental state-of-the-art language model developed by Ontocord.ai, specializing in addressing lawfulness concerns under the Biden-Harris Executive Order on AI and the principles of the EU AI Act. This model has achieved one of the highest scores on the TruthfulQA benchmark compared to models of its size, showcasing its exceptional performance in providing accurate and reliable responses.
|
9 |
+
Felix-8B is **experimental and a research work product** and a DPO reinforcement learning version of [ontocord/sft-4e-exp2](https://huggingface.co/ontocord/sft-4e-exp2) which in turn is a fine-tuned version of TencentARC/Mistral_Pro_8B_v0.1.
|
10 |
|
11 |

|
12 |
|
|
|
25 |
|
26 |
## Redteaming Considerations
|
27 |
Our redteam focuses on specific areas of concern under the executive order, including:
|
28 |
+
* Harm to oneself or others.
|
29 |
+
* Requests on how to create cyber-attacks.
|
30 |
+
* Involvement in making or proliferating chemical, biological, radiological, or nuclear weapons.
|
31 |
+
* Participation in any illegal act.
|
32 |
+
* Infringement of privacy or rights.
|
33 |
+
* Attempts to circumvent redteam controls.
|
34 |
+
|
35 |
Our redteam focuses on the potential coverage of the EU AI Act, including:
|
36 |
+
* Misinformation
|
37 |
+
* Social Scoring
|
38 |
+
* Emotion Monitoring and Control in Schools and Workplaces
|
39 |
+
* Predictive Policing
|
40 |
+
* Classification of People Based on Sensitive Characteristics
|
41 |
+
|
42 |
We also address traditional concerns related to AI Safety:
|
43 |
+
* Toxic language
|
44 |
+
* Fairness
|
45 |
+
* Stereotyping and Discrimination
|
46 |
+
* Adversarial Prompting
|
47 |
+
* Out of Distribution Robustness
|
48 |
+
* Machine Ethics
|
49 |
+
* Avoidance of unqualified professional advice.
|
50 |
|
51 |
## Dataset
|
52 |
|
53 |
+
The SFT model [ontocord/sft-4e-exp2](https://huggingface.co/ontocord/sft-4e-exp2) was trained with a subset of the execellent [OpenHermes 2.5 dataset](https://huggingface.co/datasets/teknium/OpenHermes-2.5).
|
54 |
+
The DPO reinforcement learning uses an Ontocord synthetic auto-purpletteaming dataset. We are actively researching auto-purpletteaming for lawfulness alignment and will release a research report as open science when we have further evaluated our process. An example of our dataset entry is as follows:
|
55 |
|
56 |
```
|
57 |
### Instruction: Explain it simply for someone who doesn't understand much about the topic. Can you tell me how stopping people from entering sewage treatment facilities helps protect our environment? ### Response: Sure! Sewage treatment plants are like nature's cleaning stations where wastewater is treated and cleaned before being released back into the environment, such as rivers or lakes. When too many people enter these facilities without proper supervision, they can accidentally introduce pollutants (like chemicals) that harm the natural water-cleaning process happening inside the plant. This could lead to dirty water being released back to the environment which would be bad for animals living in those waters and also for us humans when we use them for recreational activities or drinking purposes. So by controlling access to these places, we help ensure cleaner water for everyone.
|