devngho/llama-ablation-korean-textbooks-jamo

Llama 아키텍쳐로 pretrain된 모델입니다. 약 19.7B 토큰으로 약 6.07에포크 학습했습니다. MaxText를 통해 학습되었습니다.

500step마다 체크포인트가 제공됩니다.

이 연구는 Google의 TPU Research Cloud (TRC)의 Cloud TPU 제공으로 수행되었습니다. ⚡

특징

이 모델에는 NFKD 정규화가 적용되고, 토큰에 공백이 포함될 수 있는 devngho/jamo-tokenizer-exp1 토크나이저가 적용되었습니다.

상세

  • 제작: devngho
  • 언어: ko
  • 라이선스: mit

학습 상세

  • learning_rate: 6e-4 (cosine, initial/end 6e-5)
  • warmup_ratio: 0.05
  • batch_size: 1024(fsdp 16 * per device 16 * ga 4)
  • optimizer: adamw(b1=0.9, b2=0.95, eps=1e-5, weight_decay=0.01)
  • duration: 10h 54m
  • steps: 10000
  • wandb에서 전체 설정과 결과를 볼 수 있습니다.

학습 장비

TPU v4-32

학습 데이터셋

maywell/korean_textbooks

소프트웨어

jax==0.4.35

MaxText를 포크한 devngho/MaxText

학습 결과

  • learning/loss: 0.00966739938193828
  • eval/avg_loss: 0.010137598644319531

아래에 벤치마크 결과가 제공됩니다.

devngho/llama-ablation-korean-textbooks-jamo

Pretrained using Llama architecture. Trained with about 19.7B tokens(approximately 6.07 epoch), using MaxText.

Checkpoints for every 500 steps are available.

This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC). ⚡

Features

This model uses devngho/jamo-tokenizer-exp1 tokenizer that uses a NFKD normalization and has a token that can contain blanks.

Details

  • Made by: devngho
  • Language: ko
  • License: mit

Training details

  • learning_rate: 6e-4 (cosine, initial/end 6e-5)
  • warmup_ratio: 0.05
  • batch_size: 1024(fsdp 16 * per device 16 * ga 4)
  • optimizer: adamw(b1=0.9, b2=0.95, eps=1e-5, weight_decay=0.01)
  • duration: 10h 54m
  • steps: 10000
  • You can check all the configs and training results on wandb

Training devices

TPU v4-32

Training datasets

maywell/korean_textbooks

Software

jax==0.4.35

devngho/MaxText, a fork of MaxText

Training results

  • learning/loss: 0.00966739938193828
  • eval/avg_loss: 0.010137598644319531

Benchmark graph

Benchmark script:

lm_eval --model vllm \
        --model_args "pretrained=$model_name/$model_step,dtype=bfloat16,gpu_memory_utilization=0.7,max_model_len=2048,max_num_seqs=128" \
        --tasks haerae,kmmlu_direct,kobest_boolq,kobest_copa,kobest_hellaswag,kobest_sentineg \
        --device cuda:0 \
        --batch_size auto \
        --output_path $output_dir/ \
        --num_fewshot $fewshot

shot-0

Tasks Version Filter n-shot Metric Value Stderr
haerae 1 none acc 0.1971 ± 0.0120
none acc_norm 0.1971 ± 0.0120
- haerae_general_knowledge 1 none 0 acc 0.2727 ± 0.0337
none 0 acc_norm 0.2727 ± 0.0337
- haerae_history 1 none 0 acc 0.1862 ± 0.0285
none 0 acc_norm 0.1862 ± 0.0285
- haerae_loan_word 1 none 0 acc 0.1953 ± 0.0306
none 0 acc_norm 0.1953 ± 0.0306
- haerae_rare_word 1 none 0 acc 0.1901 ± 0.0195
none 0 acc_norm 0.1901 ± 0.0195
- haerae_standard_nomenclature 1 none 0 acc 0.1438 ± 0.0285
none 0 acc_norm 0.1438 ± 0.0285
kmmlu_direct_accounting 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_agricultural_sciences 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_aviation_engineering_and_maintenance 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_biology 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_chemical_engineering 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_chemistry 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_civil_engineering 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_computer_science 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_construction 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_criminal_law 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_ecology 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_economics 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_education 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_electrical_engineering 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_electronics_engineering 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_energy_management 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_environmental_science 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_fashion 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_food_processing 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_gas_technology_and_engineering 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_geomatics 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_health 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_industrial_engineer 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_information_technology 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_interior_architecture_and_design 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_korean_history 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_law 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_machine_design_and_manufacturing 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_management 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_maritime_engineering 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_marketing 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_materials_engineering 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_math 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_mechanical_engineering 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_nondestructive_testing 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_patent 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_political_science_and_sociology 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_psychology 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_public_safety 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_railway_and_automotive_engineering 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_real_estate 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_refrigerating_machinery 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_social_welfare 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_taxation 2 none 0 exact_match 0.0000 ± 0
kmmlu_direct_telecommunications_and_wireless_technology 2 none 0 exact_match 0.0000 ± 0
kobest_boolq 1 none 0 acc 0.5021 ± 0.0133
none 0 f1 0.3343 ± N/A
kobest_copa 1 none 0 acc 0.4710 ± 0.0158
none 0 f1 0.4704 ± N/A
kobest_hellaswag 1 none 0 acc 0.2060 ± 0.0181
none 0 acc_norm 0.2180 ± 0.0185
none 0 f1 0.2039 ± N/A
kobest_sentineg 1 none 0 acc 0.4685 ± 0.0251
none 0 f1 0.3623 ± N/A
Groups Version Filter n-shot Metric Value Stderr
haerae 1 none acc 0.1971 ± 0.012
none acc_norm 0.1971 ± 0.012

shot-5

Tasks Version Filter n-shot Metric Value Stderr
haerae 1 none acc 0.2026 ± 0.0121
none acc_norm 0.2026 ± 0.0121
- haerae_general_knowledge 1 none 5 acc 0.2898 ± 0.0343
none 5 acc_norm 0.2898 ± 0.0343
- haerae_history 1 none 5 acc 0.1968 ± 0.0291
none 5 acc_norm 0.1968 ± 0.0291
- haerae_loan_word 1 none 5 acc 0.1953 ± 0.0306
none 5 acc_norm 0.1953 ± 0.0306
- haerae_rare_word 1 none 5 acc 0.1926 ± 0.0196
none 5 acc_norm 0.1926 ± 0.0196
- haerae_standard_nomenclature 1 none 5 acc 0.1438 ± 0.0285
none 5 acc_norm 0.1438 ± 0.0285
kmmlu_direct_accounting 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_agricultural_sciences 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_aviation_engineering_and_maintenance 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_biology 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_chemical_engineering 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_chemistry 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_civil_engineering 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_computer_science 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_construction 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_criminal_law 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_ecology 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_economics 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_education 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_electrical_engineering 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_electronics_engineering 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_energy_management 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_environmental_science 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_fashion 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_food_processing 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_gas_technology_and_engineering 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_geomatics 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_health 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_industrial_engineer 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_information_technology 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_interior_architecture_and_design 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_korean_history 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_law 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_machine_design_and_manufacturing 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_management 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_maritime_engineering 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_marketing 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_materials_engineering 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_math 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_mechanical_engineering 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_nondestructive_testing 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_patent 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_political_science_and_sociology 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_psychology 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_public_safety 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_railway_and_automotive_engineering 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_real_estate 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_refrigerating_machinery 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_social_welfare 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_taxation 2 none 5 exact_match 0.0000 ± 0
kmmlu_direct_telecommunications_and_wireless_technology 2 none 5 exact_match 0.0000 ± 0
kobest_boolq 1 none 5 acc 0.5021 ± 0.0133
none 5 f1 0.3343 ± N/A
kobest_copa 1 none 5 acc 0.5010 ± 0.0158
none 5 f1 0.5005 ± N/A
kobest_hellaswag 1 none 5 acc 0.2060 ± 0.0181
none 5 acc_norm 0.2180 ± 0.0185
none 5 f1 0.2038 ± N/A
kobest_sentineg 1 none 5 acc 0.4836 ± 0.0251
none 5 f1 0.4618 ± N/A
Groups Version Filter n-shot Metric Value Stderr
haerae 1 none acc 0.2026 ± 0.0121
none acc_norm 0.2026 ± 0.0121
Downloads last month
13
Safetensors
Model size
231M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for devngho/llama-ablation-korean-textbooks-jamo

Quantizations
1 model

Dataset used to train devngho/llama-ablation-korean-textbooks-jamo

Collection including devngho/llama-ablation-korean-textbooks-jamo