
Islamspecialist
The rationalist muslim scholar in a Mistral 7B LLM Make sure to include "Finished." as a stop token or the generation will go on forever. (ive spoken with heralax about this) This model is a fine-tuned version of a 2.8 million token pretrained Mistral model on reputable and rational Muslim scolars critically analyzing the religion of Islam.
See axolotl config
axolotl version: 0.12.1
base_model: c4tdr0ut/ms
tokenizer_type: AutoTokenizer
model_type: AutoModelForCausalLM
load_in_8bit: false
load_in_4bit: false
strict: false
# Dataset Optimization
datasets:
- path: axolotl_rag_conversations_inputs.jsonl
type: input_output
- path: pretraining_subset_2147229.jsonl
type: completion
- path: generic_sft_completion/Augmentoolkit-Augmentoolkit-LMsys-800k-Thoughts_200000.jsonl
type: completion
- path: generic_sft_completion/Augmentoolkit-Augmentoolkit-Generic-Grabbag-Thoughts_400000.jsonl
type: completion
- path: generic_sft_completion/Augmentoolkit-Openthoughts-100mil-DifferentFormat_800000.jsonl
type: completion
- path: generic_sft_completion/Augmentoolkit-Augmentoolkit-Pippa-Thoughts_200000.jsonl
type: completion
- path: generic_sft_completion/Augmentoolkit-Augmentoolkit-Bluemoon-1mil-thoughts_200000.jsonl
type: completion
- path: generic_sft_completion/Augmentoolkit-Augmentoolkit-Capybara-2point5mil-Thoughts_200000.jsonl
type: completion
- path: factual_sft_completion/combined_all_1.jsonl
type: completion
- path: factual_sft_completion/combined_all_2.jsonl
type: completion
- path: factual_sft_completion/combined_all_0.jsonl
type: completion
- path: factual_sft_completion/combined_all_3.jsonl
type: completion
dataset_prepared_path: last_finetune_prepared
output_dir: ./finetune-model-output
seed: 1337
# Training Parameters
sequence_len: 5000
sample_packing: true
pad_to_sequence_len: true
shuffle_merged_datasets: true
gradient_accumulation_steps: 75
micro_batch_size: 2
eval_batch_size: 2
num_epochs: 4
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 1.2e-05
max_grad_norm: 1.0
noisy_embedding_alpha: 3
weight_decay: 0.01
# Regularization
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
logging_steps: 1
xformers_attention: false
flash_attention: true
chat_template: chatml
auto_resume_from_checkpoints: false
warmup_ratio: 0.05
val_set_size: 0.08
eval_sample_packing: false
save_total_limit: 3
# Step-based evaluation/saving for early stopping
eval_steps: 100
save_steps: 200
early_stopping_patience: 3
early_stopping_threshold: 0.015
# Token Handling
special_tokens:
pad_token: <unk>
# Liger Configuration
use_liger_kernel: true
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
liger_fused_linear_cross_entropy: true
# Monitoring
wandb_project: test-project
wandb_entity: ''
wandb_watch: gradients
wandb_run_id: ''
wandb_log_model: ''
# Hub Configuration
hub_model_id: islamspecialist
hub_strategy: all_checkpoints
Training procedure
1x NVIDIA B200 for one hour on Deepinfra
2.5m tokens PRETRAINING + sft
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1.2e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 1337
- gradient_accumulation_steps: 75
- total_train_batch_size: 150
- optimizer: Use paged_adamw_8bit with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 2
- training_steps: 24
Training results
Training Loss | Epoch | Step | Validation Loss | Mem Active(gib) | Mem Allocated(gib) | Mem Reserved(gib) |
---|---|---|---|---|---|---|
No log | 0 | 0 | 1.3610 | 16.59 | 16.59 | 17.9 |
Framework versions
- Transformers 4.55.0
- Pytorch 2.7.0+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4
- Downloads last month
- 89