Islamspecialist

The rationalist muslim scholar in a Mistral 7B LLM Make sure to include "Finished." as a stop token or the generation will go on forever. (ive spoken with heralax about this) This model is a fine-tuned version of a 2.8 million token pretrained Mistral model on reputable and rational Muslim scolars critically analyzing the religion of Islam.

See axolotl config

axolotl version: 0.12.1

base_model: c4tdr0ut/ms
tokenizer_type: AutoTokenizer
model_type: AutoModelForCausalLM
load_in_8bit: false
load_in_4bit: false
strict: false

# Dataset Optimization
datasets:
  - path: axolotl_rag_conversations_inputs.jsonl
    type: input_output
  - path: pretraining_subset_2147229.jsonl
    type: completion
  - path: generic_sft_completion/Augmentoolkit-Augmentoolkit-LMsys-800k-Thoughts_200000.jsonl
    type: completion
  - path: generic_sft_completion/Augmentoolkit-Augmentoolkit-Generic-Grabbag-Thoughts_400000.jsonl
    type: completion
  - path: generic_sft_completion/Augmentoolkit-Openthoughts-100mil-DifferentFormat_800000.jsonl
    type: completion
  - path: generic_sft_completion/Augmentoolkit-Augmentoolkit-Pippa-Thoughts_200000.jsonl
    type: completion
  - path: generic_sft_completion/Augmentoolkit-Augmentoolkit-Bluemoon-1mil-thoughts_200000.jsonl
    type: completion
  - path: generic_sft_completion/Augmentoolkit-Augmentoolkit-Capybara-2point5mil-Thoughts_200000.jsonl
    type: completion
  - path: factual_sft_completion/combined_all_1.jsonl
    type: completion
  - path: factual_sft_completion/combined_all_2.jsonl
    type: completion
  - path: factual_sft_completion/combined_all_0.jsonl
    type: completion
  - path: factual_sft_completion/combined_all_3.jsonl
    type: completion

dataset_prepared_path: last_finetune_prepared
output_dir: ./finetune-model-output
seed: 1337

# Training Parameters
sequence_len: 5000
sample_packing: true
pad_to_sequence_len: true
shuffle_merged_datasets: true
gradient_accumulation_steps: 75
micro_batch_size: 2
eval_batch_size: 2
num_epochs: 4
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 1.2e-05
max_grad_norm: 1.0
noisy_embedding_alpha: 3
weight_decay: 0.01

# Regularization
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
logging_steps: 1
xformers_attention: false
flash_attention: true
chat_template: chatml
auto_resume_from_checkpoints: false
warmup_ratio: 0.05
val_set_size: 0.08
eval_sample_packing: false
save_total_limit: 3

# Step-based evaluation/saving for early stopping
eval_steps: 100
save_steps: 200
early_stopping_patience: 3
early_stopping_threshold: 0.015

# Token Handling
special_tokens:
  pad_token: <unk>

# Liger Configuration
use_liger_kernel: true
plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
liger_fused_linear_cross_entropy: true

# Monitoring
wandb_project: test-project
wandb_entity: ''
wandb_watch: gradients
wandb_run_id: ''
wandb_log_model: ''

# Hub Configuration
hub_model_id: islamspecialist
hub_strategy: all_checkpoints

Training procedure

1x NVIDIA B200 for one hour on Deepinfra

2.5m tokens PRETRAINING + sft

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1.2e-05
train_batch_size: 2
eval_batch_size: 2
seed: 1337
gradient_accumulation_steps: 75
total_train_batch_size: 150
optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 2
training_steps: 24

Training results

Training Loss	Epoch	Step	Validation Loss	Mem Active(gib)	Mem Allocated(gib)	Mem Reserved(gib)
No log	0	0	1.3610	16.59	16.59	17.9

Framework versions

Transformers 4.55.0
Pytorch 2.7.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

c4tdr0ut
/

Islamspecialist-7B

Islamspecialist

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for c4tdr0ut/Islamspecialist-7B

Evaluation results