---
license: mit
datasets:
- mlabonne/orpo-dpo-mix-40k
---
This is a uncenscored version of Phi-3.
Abliterated using the following the guide here: https://huggingface.co/blog/mlabonne/abliteration
Then it was fine tuned on orpo-dpo-mix-40k
[
](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config
axolotl version: `0.4.0`
```yaml
base_model: cowWhySo/Phi-3-mini-4k-instruct-Friendly
trust_remote_code: true
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
chat_template: phi_3
load_in_8bit: false
load_in_4bit: true
strict: false
save_safetensors: true
rl: dpo
datasets:
- path: mlabonne/orpo-dpo-mix-40k
split: train
type: chatml.intel
dataset_prepared_path:
val_set_size: 0.0
output_dir: ./out
sequence_len: 4096
sample_packing: false
pad_to_sequence_len: false
adapter: qlora
lora_model_dir:
lora_r: 64
lora_alpha: 32
lora_dropout: 0.1
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_name: phi3-mini-4k-instruct-Friendly
wandb_log_model:
gradient_accumulation_steps: 8
micro_batch_size: 4
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: linear
learning_rate: 5e-6
train_on_inputs: false
group_by_length: false
bf16: auto
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: True
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 150
evals_per_epoch: 0
eval_table_size:
eval_table_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero3.json
weight_decay: 0.01
max_grad_norm: 1.0
resize_token_embeddings_to_32x: true
```
## Quants
GGUF: https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly-gguf
## Benchmarks
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|--------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|[Phi-3-mini-4k-instruct-Friendly](https://huggingface.co/cowWhySo/Phi-3-mini-4k-instruct-Friendly)| 41| 67.56| 46.36| 39.3| 48.56|
### AGIEval
| Task |Version| Metric |Value| |Stderr|
|------------------------------|------:|--------|----:|---|-----:|
|agieval_aqua_rat | 0|acc |22.05|± | 2.61|
| | |acc_norm|22.05|± | 2.61|
|agieval_logiqa_en | 0|acc |41.01|± | 1.93|
| | |acc_norm|41.32|± | 1.93|
|agieval_lsat_ar | 0|acc |22.17|± | 2.75|
| | |acc_norm|22.17|± | 2.75|
|agieval_lsat_lr | 0|acc |45.69|± | 2.21|
| | |acc_norm|45.88|± | 2.21|
|agieval_lsat_rc | 0|acc |59.48|± | 3.00|
| | |acc_norm|56.51|± | 3.03|
|agieval_sat_en | 0|acc |75.24|± | 3.01|
| | |acc_norm|70.39|± | 3.19|
|agieval_sat_en_without_passage| 0|acc |39.81|± | 3.42|
| | |acc_norm|37.86|± | 3.39|
|agieval_sat_math | 0|acc |33.64|± | 3.19|
| | |acc_norm|31.82|± | 3.15|
Average: 41.0%
### GPT4All
| Task |Version| Metric |Value| |Stderr|
|-------------|------:|--------|----:|---|-----:|
|arc_challenge| 0|acc |49.74|± | 1.46|
| | |acc_norm|50.43|± | 1.46|
|arc_easy | 0|acc |76.68|± | 0.87|
| | |acc_norm|73.23|± | 0.91|
|boolq | 1|acc |79.27|± | 0.71|
|hellaswag | 0|acc |57.91|± | 0.49|
| | |acc_norm|77.13|± | 0.42|
|openbookqa | 0|acc |35.00|± | 2.14|
| | |acc_norm|43.80|± | 2.22|
|piqa | 0|acc |77.86|± | 0.97|
| | |acc_norm|79.54|± | 0.94|
|winogrande | 0|acc |69.53|± | 1.29|
Average: 67.56%
### TruthfulQA
| Task |Version|Metric|Value| |Stderr|
|-------------|------:|------|----:|---|-----:|
|truthfulqa_mc| 1|mc1 |31.21|± | 1.62|
| | |mc2 |46.36|± | 1.55|
Average: 46.36%
### Bigbench
| Task |Version| Metric |Value| |Stderr|
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|bigbench_causal_judgement | 0|multiple_choice_grade|54.74|± | 3.62|
|bigbench_date_understanding | 0|multiple_choice_grade|66.67|± | 2.46|
|bigbench_disambiguation_qa | 0|multiple_choice_grade|29.46|± | 2.84|
|bigbench_geometric_shapes | 0|multiple_choice_grade|11.98|± | 1.72|
| | |exact_str_match | 0.00|± | 0.00|
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|28.00|± | 2.01|
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|17.14|± | 1.43|
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|45.67|± | 2.88|
|bigbench_movie_recommendation | 0|multiple_choice_grade|24.40|± | 1.92|
|bigbench_navigate | 0|multiple_choice_grade|53.70|± | 1.58|
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|68.10|± | 1.04|
|bigbench_ruin_names | 0|multiple_choice_grade|31.03|± | 2.19|
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|15.93|± | 1.16|
|bigbench_snarks | 0|multiple_choice_grade|77.35|± | 3.12|
|bigbench_sports_understanding | 0|multiple_choice_grade|52.64|± | 1.59|
|bigbench_temporal_sequences | 0|multiple_choice_grade|51.50|± | 1.58|
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|19.52|± | 1.12|
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|13.89|± | 0.83|
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|45.67|± | 2.88|
Average: 39.3%
Average score: 48.56%
## Training Summary
```json
{
"train/loss": 0.299,
"train/grad_norm": 0.9337566701340533,
"train/learning_rate": 0,
"train/rewards/chosen": 0.08704188466072083,
"train/rewards/rejected": -2.835820436477661,
"train/rewards/accuracies": 0.84375,
"train/rewards/margins": 2.9228620529174805,
"train/logps/rejected": -509.9840393066406,
"train/logps/chosen": -560.8234252929688,
"train/logits/rejected": 1.6356163024902344,
"train/logits/chosen": 1.7323706150054932,
"train/epoch": 1.002169197396963,
"train/global_step": 231,
"_timestamp": 1717711643.3345022,
"_runtime": 22808.557655334473,
"_step": 231,
"train_runtime": 22809.152,
"train_samples_per_second": 1.944,
"train_steps_per_second": 0.01,
"total_flos": 0,
"train_loss": 0.44557410065745895,
"_wandb": {
"runtime": 22810
}
}
```