KoModernBERT-base-mlm

This model is a fine-tuned version of answerdotai/ModernBERT-base

  • Flash-Attention 2
  • StabelAdamW
  • Unpadding & Sequence Packing

Example Use

from transformers import AutoTokenizer, AutoModelForMaskedLM
from huggingface_hub import HfApi, login
with open('./api_key/HGF_TOKEN.txt', 'r') as hgf:
    login(token=hgf.read())
api = HfApi()

model_id = "x2bee/KoModernBERT-base-mlm-v01"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id).to("cuda")

def modern_bert_convert_with_multiple_masks(text: str, top_k: int = 1, select_method:str = "Logit") -> str:
    if "[MASK]" not in text:
        raise ValueError("MLM Model should include '[MASK]' in the sentence")

    while "[MASK]" in text:
        inputs = tokenizer(text, return_tensors="pt").to("cuda")
        outputs = model(**inputs)

        input_ids = inputs["input_ids"][0].tolist()
        mask_indices = [i for i, token_id in enumerate(input_ids) if token_id == tokenizer.mask_token_id]

        current_mask_index = mask_indices[0]

        logits = outputs.logits[0, current_mask_index]

        top_k_tokens = logits.topk(top_k).indices.tolist()
        top_k_logits, top_k_indices = logits.topk(top_k)
        
        if select_method == "Logit":
            probabilities = torch.softmax(top_k_logits, dim=0).tolist()
            predicted_token_id = random.choices(top_k_indices.tolist(), weights=probabilities, k=1)[0]
            predicted_token = tokenizer.decode([predicted_token_id]).strip()
            
        elif select_method == "Random":
            predicted_token_id = random.choice(top_k_tokens)
            predicted_token = tokenizer.decode([predicted_token_id]).strip()
            
        elif select_method == "Best":
            predicted_token_id = top_k_tokens[0]
            predicted_token = tokenizer.decode([predicted_token_id]).strip()
            
        else:
            raise ValueError("select_method should be one of ['Logit', 'Random', 'Best']")

        text = text.replace("[MASK]", predicted_token, 1)

        print(f"Predicted: {predicted_token} | Current text: {text}")

    return text
text = "30์ผ ์ „๋‚จ ๋ฌด์•ˆ๊ตญ์ œ[MASK] ํ™œ์ฃผ๋กœ์— ์ „๋‚  ๋ฐœ์ƒํ•œ ์ œ์ฃผํ•ญ๊ณต [MASK] ๋‹น์‹œ ๊ธฐ์ฒด๊ฐ€ [MASK]์ฐฉ๋ฅ™ํ•˜๋ฉด์„œ ๊ฐ•ํ•œ ๋งˆ์ฐฐ๋กœ ์ƒ๊ธด ํ”์ ์ด ๋‚จ์•„ ์žˆ๋‹ค. ์ด ์ฐธ์‚ฌ๋กœ [MASK]๊ณผ ์Šน๋ฌด์› 181๋ช… ์ค‘ 179๋ช…์ด ์ˆจ์ง€๊ณ  [MASK]๋Š” ํ˜•์ฒด๋ฅผ ์•Œ์•„๋ณผ ์ˆ˜ ์—†์ด [MASK]๋๋‹ค. [MASK] ๊ทœ๋ชจ์™€ [MASK] ์›์ธ ๋“ฑ์— ๋Œ€ํ•ด ๋‹ค์–‘ํ•œ [MASK]์ด ์ œ๊ธฐ๋˜๊ณ  ์žˆ๋Š” ๊ฐ€์šด๋ฐ [MASK]์— ์„ค์น˜๋œ [MASK](์ฐฉ๋ฅ™ ์œ ๋„ ์•ˆ์ „์‹œ์„ค)๊ฐ€ [MASK]๋ฅผ ํ‚ค์› ๋‹ค๋Š” [MASK]์ด ๋‚˜์˜ค๊ณ  ์žˆ๋‹ค."
result = mbm.modern_bert_convert_with_multiple_masks(text, top_k=1)

'30์ผ ์ „๋‚จ ๋ฌด์•ˆ๊ตญ์ œํ„ฐ๋ฏธ๋„ ํ™œ์ฃผ๋กœ์— ์ „๋‚  ๋ฐœ์ƒํ•œ ์ œ์ฃผํ•ญ๊ณต ์‚ฌ๊ณ  ๋‹น์‹œ ๊ธฐ์ฒด๊ฐ€ ๋ฌด๋‹จ์ฐฉ๋ฅ™ํ•˜๋ฉด์„œ ๊ฐ•ํ•œ ๋งˆ์ฐฐ๋กœ ์ƒ๊ธด ํ”์ ์ด ๋‚จ์•„ ์žˆ๋‹ค. ์ด ์ฐธ์‚ฌ๋กœ ์Šน๊ฐ๊ณผ ์Šน๋ฌด์› 181๋ช… ์ค‘ 179๋ช…์ด ์ˆจ์ง€๊ณ  ์ผ๋ถ€๋Š” ํ˜•์ฒด๋ฅผ ์•Œ์•„๋ณผ ์ˆ˜ ์—†์ด ์‹ค์ข…๋๋‹ค. ์‚ฌ๊ณ  ๊ทœ๋ชจ์™€ ์‚ฌ๊ณ  ์›์ธ ๋“ฑ์— ๋Œ€ํ•ด ๋‹ค์–‘ํ•œ ์˜ํ˜น์ด ์ œ๊ธฐ๋˜๊ณ  ์žˆ๋Š” ๊ฐ€์šด๋ฐ ๊ธฐ๋‚ด์— ์„ค์น˜๋œ ESC(์ฐฉ๋ฅ™ ์œ ๋„ ์•ˆ์ „์‹œ์„ค)๊ฐ€ ์‚ฌ๊ณ ๋ฅผ ํ‚ค์› ๋‹ค๋Š” ์ฃผ์žฅ์ด ๋‚˜์˜ค๊ณ  ์žˆ๋‹ค.'
text = "์ค‘๊ตญ์˜ ์ˆ˜๋„๋Š” [MASK]์ด๋‹ค"
result = mbm.modern_bert_convert_with_multiple_masks(text, top_k=1)
'์ค‘๊ตญ์˜ ์ˆ˜๋„๋Š” ๋ฒ ์ด์ง•์ด๋‹ค'

text = "์ผ๋ณธ์˜ ์ˆ˜๋„๋Š” [MASK]์ด๋‹ค"
result = mbm.modern_bert_convert_with_multiple_masks(text, top_k=1)
'์ผ๋ณธ์˜ ์ˆ˜๋„๋Š” ๋„์ฟ„์ด๋‹ค'

text = "๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ๊ฐ€์žฅ ํฐ ๋„์‹œ๋Š” [MASK]์ด๋‹ค"
result = mbm.modern_bert_convert_with_multiple_masks(text, top_k=1)
'๋Œ€ํ•œ๋ฏผ๊ตญ์˜ ๊ฐ€์žฅ ํฐ ๋„์‹œ๋Š” ์„œ์šธ์ด๋‹ค'

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 512
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Framework versions

  • Transformers 4.48.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
61
Safetensors
Model size
153M params
Tensor type
FP16
ยท
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for x2bee/KoModernBERT-base-mlm

Finetuned
(249)
this model
Finetunes
1 model