metadata
datasets:
- NewEden/Orion-LIT
- NewEden/Orion-Asstr-Stories-16K
- Mielikki/Erebus-87k
base_model:
- Delta-Vector/Hamanasu-15B-R1-PT
tags:
- phi
- roleplay
- finetune
- storywriting

Hamanasu 15B R2 PT
🌌 Overview
This is the 2nd pretrain of Phi-4 Continued from the Orginal Asstr-Erebus Pretrain. This pretrain used 500 million tokens from
NewEden/Orion-LIT
This model has not been instruct tuned, Ablities to converse may be reduced from the original model, If you would like to roleplay, Please use the Instruct version.
Axolotl Config ꒰(˶• ᴗ •˶)꒱
base_model: Hamanasu-15B-R2-PT
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
#hub_model_id: NewEden/Phi4-pretrain
#hub_strategy: "all_checkpoints"
#push_dataset_to_hub:
#hf_use_auth_token: true
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
#plugins:
# - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
#cut_cross_entropy: true
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: NewEden/Orion-LIT
type: completion
field: text
shuffle_merged_datasets: true
dataset_prepared_path: prepared_data
val_set_size: 0.0
output_dir: ./phi4-ptv2-out-r1
sequence_len: 16384
sample_packing: true
pad_to_sequence_len: true
adapter: lora
lora_model_dir:
lora_r: 128
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
lora_modules_to_save:
- embed_tokens
- lm_head
wandb_project: mag-phi
wandb_entity:
wandb_watch:
wandb_name: comp-v2-attempt-01
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 1
optimizer: paged_ademamix_8bit
lr_scheduler: cosine
learning_rate: 0.00002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: unsloth
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 15
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 4
debug:
deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
weight_decay: 0.01
fsdp:
fsdp_config: