devngho/llama-ablation-korean-textbooks-jamo
Llama 아키텍쳐로 pretrain된 모델입니다. 약 19.7B 토큰으로 약 6.07에포크 학습했습니다. MaxText를 통해 학습되었습니다.
500step마다 체크포인트가 제공됩니다.
이 연구는 Google의 TPU Research Cloud (TRC)의 Cloud TPU 제공으로 수행되었습니다. ⚡
특징
이 모델에는 NFKD 정규화가 적용되고, 토큰에 공백이 포함될 수 있는 devngho/jamo-tokenizer-exp1 토크나이저가 적용되었습니다.
상세
- 제작: devngho
- 언어: ko
- 라이선스: mit
학습 상세
- learning_rate: 6e-4 (cosine, initial/end 6e-5)
- warmup_ratio: 0.05
- batch_size: 1024(fsdp 16 * per device 16 * ga 4)
- optimizer: adamw(b1=0.9, b2=0.95, eps=1e-5, weight_decay=0.01)
- duration: 10h 54m
- steps: 10000
- wandb에서 전체 설정과 결과를 볼 수 있습니다.
학습 장비
TPU v4-32
학습 데이터셋
소프트웨어
jax==0.4.35
MaxText를 포크한 devngho/MaxText
학습 결과
- learning/loss: 0.00966739938193828
- eval/avg_loss: 0.010137598644319531
아래에 벤치마크 결과가 제공됩니다.
devngho/llama-ablation-korean-textbooks-jamo
Pretrained using Llama architecture. Trained with about 19.7B tokens(approximately 6.07 epoch), using MaxText.
Checkpoints for every 500 steps are available.
This research was supported with Cloud TPUs from Google's TPU Research Cloud (TRC). ⚡
Features
This model uses devngho/jamo-tokenizer-exp1 tokenizer that uses a NFKD normalization and has a token that can contain blanks.
Details
- Made by: devngho
- Language: ko
- License: mit
Training details
- learning_rate: 6e-4 (cosine, initial/end 6e-5)
- warmup_ratio: 0.05
- batch_size: 1024(fsdp 16 * per device 16 * ga 4)
- optimizer: adamw(b1=0.9, b2=0.95, eps=1e-5, weight_decay=0.01)
- duration: 10h 54m
- steps: 10000
- You can check all the configs and training results on wandb
Training devices
TPU v4-32
Training datasets
Software
jax==0.4.35
devngho/MaxText, a fork of MaxText
Training results
- learning/loss: 0.00966739938193828
- eval/avg_loss: 0.010137598644319531
Benchmark script:
lm_eval --model vllm \
--model_args "pretrained=$model_name/$model_step,dtype=bfloat16,gpu_memory_utilization=0.7,max_model_len=2048,max_num_seqs=128" \
--tasks haerae,kmmlu_direct,kobest_boolq,kobest_copa,kobest_hellaswag,kobest_sentineg \
--device cuda:0 \
--batch_size auto \
--output_path $output_dir/ \
--num_fewshot $fewshot
shot-0
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
haerae | 1 | none | acc | ↑ | 0.1971 | ± | 0.0120 | |
none | acc_norm | ↑ | 0.1971 | ± | 0.0120 | |||
- haerae_general_knowledge | 1 | none | 0 | acc | ↑ | 0.2727 | ± | 0.0337 |
none | 0 | acc_norm | ↑ | 0.2727 | ± | 0.0337 | ||
- haerae_history | 1 | none | 0 | acc | ↑ | 0.1862 | ± | 0.0285 |
none | 0 | acc_norm | ↑ | 0.1862 | ± | 0.0285 | ||
- haerae_loan_word | 1 | none | 0 | acc | ↑ | 0.1953 | ± | 0.0306 |
none | 0 | acc_norm | ↑ | 0.1953 | ± | 0.0306 | ||
- haerae_rare_word | 1 | none | 0 | acc | ↑ | 0.1901 | ± | 0.0195 |
none | 0 | acc_norm | ↑ | 0.1901 | ± | 0.0195 | ||
- haerae_standard_nomenclature | 1 | none | 0 | acc | ↑ | 0.1438 | ± | 0.0285 |
none | 0 | acc_norm | ↑ | 0.1438 | ± | 0.0285 | ||
kmmlu_direct_accounting | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_agricultural_sciences | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_aviation_engineering_and_maintenance | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_biology | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_chemical_engineering | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_chemistry | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_civil_engineering | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_computer_science | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_construction | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_criminal_law | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_ecology | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_economics | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_education | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_electrical_engineering | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_electronics_engineering | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_energy_management | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_environmental_science | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_fashion | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_food_processing | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_gas_technology_and_engineering | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_geomatics | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_health | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_industrial_engineer | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_information_technology | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_interior_architecture_and_design | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_korean_history | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_law | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_machine_design_and_manufacturing | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_management | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_maritime_engineering | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_marketing | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_materials_engineering | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_math | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_mechanical_engineering | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_nondestructive_testing | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_patent | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_political_science_and_sociology | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_psychology | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_public_safety | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_railway_and_automotive_engineering | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_real_estate | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_refrigerating_machinery | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_social_welfare | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_taxation | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_telecommunications_and_wireless_technology | 2 | none | 0 | exact_match | ↑ | 0.0000 | ± | 0 |
kobest_boolq | 1 | none | 0 | acc | ↑ | 0.5021 | ± | 0.0133 |
none | 0 | f1 | ↑ | 0.3343 | ± | N/A | ||
kobest_copa | 1 | none | 0 | acc | ↑ | 0.4710 | ± | 0.0158 |
none | 0 | f1 | ↑ | 0.4704 | ± | N/A | ||
kobest_hellaswag | 1 | none | 0 | acc | ↑ | 0.2060 | ± | 0.0181 |
none | 0 | acc_norm | ↑ | 0.2180 | ± | 0.0185 | ||
none | 0 | f1 | ↑ | 0.2039 | ± | N/A | ||
kobest_sentineg | 1 | none | 0 | acc | ↑ | 0.4685 | ± | 0.0251 |
none | 0 | f1 | ↑ | 0.3623 | ± | N/A |
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
haerae | 1 | none | acc | ↑ | 0.1971 | ± | 0.012 | |
none | acc_norm | ↑ | 0.1971 | ± | 0.012 |
shot-5
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
haerae | 1 | none | acc | ↑ | 0.2026 | ± | 0.0121 | |
none | acc_norm | ↑ | 0.2026 | ± | 0.0121 | |||
- haerae_general_knowledge | 1 | none | 5 | acc | ↑ | 0.2898 | ± | 0.0343 |
none | 5 | acc_norm | ↑ | 0.2898 | ± | 0.0343 | ||
- haerae_history | 1 | none | 5 | acc | ↑ | 0.1968 | ± | 0.0291 |
none | 5 | acc_norm | ↑ | 0.1968 | ± | 0.0291 | ||
- haerae_loan_word | 1 | none | 5 | acc | ↑ | 0.1953 | ± | 0.0306 |
none | 5 | acc_norm | ↑ | 0.1953 | ± | 0.0306 | ||
- haerae_rare_word | 1 | none | 5 | acc | ↑ | 0.1926 | ± | 0.0196 |
none | 5 | acc_norm | ↑ | 0.1926 | ± | 0.0196 | ||
- haerae_standard_nomenclature | 1 | none | 5 | acc | ↑ | 0.1438 | ± | 0.0285 |
none | 5 | acc_norm | ↑ | 0.1438 | ± | 0.0285 | ||
kmmlu_direct_accounting | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_agricultural_sciences | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_aviation_engineering_and_maintenance | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_biology | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_chemical_engineering | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_chemistry | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_civil_engineering | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_computer_science | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_construction | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_criminal_law | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_ecology | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_economics | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_education | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_electrical_engineering | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_electronics_engineering | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_energy_management | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_environmental_science | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_fashion | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_food_processing | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_gas_technology_and_engineering | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_geomatics | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_health | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_industrial_engineer | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_information_technology | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_interior_architecture_and_design | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_korean_history | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_law | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_machine_design_and_manufacturing | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_management | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_maritime_engineering | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_marketing | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_materials_engineering | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_math | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_mechanical_engineering | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_nondestructive_testing | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_patent | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_political_science_and_sociology | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_psychology | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_public_safety | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_railway_and_automotive_engineering | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_real_estate | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_refrigerating_machinery | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_social_welfare | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_taxation | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kmmlu_direct_telecommunications_and_wireless_technology | 2 | none | 5 | exact_match | ↑ | 0.0000 | ± | 0 |
kobest_boolq | 1 | none | 5 | acc | ↑ | 0.5021 | ± | 0.0133 |
none | 5 | f1 | ↑ | 0.3343 | ± | N/A | ||
kobest_copa | 1 | none | 5 | acc | ↑ | 0.5010 | ± | 0.0158 |
none | 5 | f1 | ↑ | 0.5005 | ± | N/A | ||
kobest_hellaswag | 1 | none | 5 | acc | ↑ | 0.2060 | ± | 0.0181 |
none | 5 | acc_norm | ↑ | 0.2180 | ± | 0.0185 | ||
none | 5 | f1 | ↑ | 0.2038 | ± | N/A | ||
kobest_sentineg | 1 | none | 5 | acc | ↑ | 0.4836 | ± | 0.0251 |
none | 5 | f1 | ↑ | 0.4618 | ± | N/A |
Groups | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
haerae | 1 | none | acc | ↑ | 0.2026 | ± | 0.0121 | |
none | acc_norm | ↑ | 0.2026 | ± | 0.0121 |
- Downloads last month
- 13