MASR / transformers /docs /source /ko /tasks /multiple_choice.md
Yuvarraj's picture
Initial commit
a0db2f9

๊ฐ๊ด€์‹ ๋ฌธ์ œ[[multiple-choice]]

[[open-in-colab]]

๊ฐ๊ด€์‹ ๊ณผ์ œ๋Š” ๋ฌธ๋งฅ๊ณผ ํ•จ๊ป˜ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํ›„๋ณด ๋‹ต๋ณ€์ด ์ œ๊ณต๋˜๊ณ  ๋ชจ๋ธ์ด ์ •๋‹ต์„ ์„ ํƒํ•˜๋„๋ก ํ•™์Šต๋œ๋‹ค๋Š” ์ ์„ ์ œ์™ธํ•˜๋ฉด ์งˆ์˜์‘๋‹ต๊ณผ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค.

์ง„ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค:

  1. SWAG ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ 'regular' ๊ตฌ์„ฑ์œผ๋กœ BERT๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ์—ฌ๋Ÿฌ ์˜ต์…˜๊ณผ ์ผ๋ถ€ ์ปจํ…์ŠคํŠธ๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๋‹ต์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  2. ์ถ”๋ก ์— ๋ฏธ์„ธ ์กฐ์ •๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ ์„ค๋ช…ํ•˜๋Š” ์ž‘์—…์€ ๋‹ค์Œ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์—์„œ ์ง€์›๋ฉ๋‹ˆ๋‹ค:

ALBERT, BERT, BigBird, CamemBERT, CANINE, ConvBERT, Data2VecText, DeBERTa-v2, DistilBERT, ELECTRA, ERNIE, ErnieM, FlauBERT, FNet, Funnel Transformer, I-BERT, Longformer, LUKE, MEGA, Megatron-BERT, MobileBERT, MPNet, Nezha, Nystrรถmformer, QDQBert, RemBERT, RoBERTa, RoBERTa-PreLayerNorm, RoCBert, RoFormer, SqueezeBERT, XLM, XLM-RoBERTa, XLM-RoBERTa-XL, XLNet, X-MOD, YOSO

์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ๋ชจ๋‘ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:

pip install transformers datasets evaluate

๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•˜๊ณ  ์ปค๋ฎค๋‹ˆํ‹ฐ์™€ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ—ˆ๊น…ํŽ˜์ด์Šค ๊ณ„์ •์— ๋กœ๊ทธ์ธํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ๋ฉ”์‹œ์ง€๊ฐ€ ํ‘œ์‹œ๋˜๋ฉด ํ† ํฐ์„ ์ž…๋ ฅํ•˜์—ฌ ๋กœ๊ทธ์ธํ•ฉ๋‹ˆ๋‹ค:

>>> from huggingface_hub import notebook_login

>>> notebook_login()

SWAG ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ฐ€์ ธ์˜ค๊ธฐ[[load-swag-dataset]]

๋จผ์ € ๐Ÿค— Datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ SWAG ๋ฐ์ดํ„ฐ์…‹์˜ '์ผ๋ฐ˜' ๊ตฌ์„ฑ์„ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค:

>>> from datasets import load_dataset

>>> swag = load_dataset("swag", "regular")

์ด์ œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ดํŽด๋ด…๋‹ˆ๋‹ค:

>>> swag["train"][0]
{'ending0': 'passes by walking down the street playing their instruments.',
 'ending1': 'has heard approaching them.',
 'ending2': "arrives and they're outside dancing and asleep.",
 'ending3': 'turns the lead singer watches the performance.',
 'fold-ind': '3416',
 'gold-source': 'gold',
 'label': 0,
 'sent1': 'Members of the procession walk down the street holding small horn brass instruments.',
 'sent2': 'A drum line',
 'startphrase': 'Members of the procession walk down the street holding small horn brass instruments. A drum line',
 'video-id': 'anetv_jkn6uvmqwh4'}

์—ฌ๊ธฐ์—๋Š” ๋งŽ์€ ํ•„๋“œ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ด์ง€๋งŒ ์‹ค์ œ๋กœ๋Š” ๋งค์šฐ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค:

  • sent1 ๋ฐ sent2: ์ด ํ•„๋“œ๋Š” ๋ฌธ์žฅ์ด ์–ด๋–ป๊ฒŒ ์‹œ์ž‘๋˜๋Š”์ง€ ๋ณด์—ฌ์ฃผ๋ฉฐ, ์ด ๋‘ ํ•„๋“œ๋ฅผ ํ•ฉ์น˜๋ฉด ์‹œ์ž‘ ๊ตฌ์ ˆ(startphrase) ํ•„๋“œ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
  • ์ข…๋ฃŒ ๊ตฌ์ ˆ(ending): ๋ฌธ์žฅ์ด ์–ด๋–ป๊ฒŒ ๋๋‚  ์ˆ˜ ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ๊ฐ€๋Šฅํ•œ ์ข…๋ฃŒ ๊ตฌ์ ˆ๋ฅผ ์ œ์‹œํ•˜์ง€๋งŒ ๊ทธ ์ค‘ ํ•˜๋‚˜๋งŒ ์ •๋‹ต์ž…๋‹ˆ๋‹ค.
  • ๋ ˆ์ด๋ธ”(label): ์˜ฌ๋ฐ”๋ฅธ ๋ฌธ์žฅ ์ข…๋ฃŒ ๊ตฌ์ ˆ์„ ์‹๋ณ„ํ•ฉ๋‹ˆ๋‹ค.

์ „์ฒ˜๋ฆฌ[[preprocess]]

๋‹ค์Œ ๋‹จ๊ณ„๋Š” ๋ฌธ์žฅ์˜ ์‹œ์ž‘๊ณผ ๋„ค ๊ฐ€์ง€ ๊ฐ€๋Šฅํ•œ ๊ตฌ์ ˆ์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด BERT ํ† ํฌ๋‚˜์ด์ €๋ฅผ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค:

>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

์ƒ์„ฑํ•˜๋ ค๋Š” ์ „์ฒ˜๋ฆฌ ํ•จ์ˆ˜๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค:

  1. sent1 ํ•„๋“œ๋ฅผ ๋„ค ๊ฐœ ๋ณต์‚ฌํ•œ ๋‹ค์Œ ๊ฐ๊ฐ์„ sent2์™€ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ฌธ์žฅ์ด ์‹œ์ž‘๋˜๋Š” ๋ฐฉ์‹์„ ์žฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.
  2. sent2๋ฅผ ๋„ค ๊ฐ€์ง€ ๊ฐ€๋Šฅํ•œ ๋ฌธ์žฅ ๊ตฌ์ ˆ ๊ฐ๊ฐ๊ณผ ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค.
  3. ์ด ๋‘ ๋ชฉ๋ก์„ ํ† ํฐํ™”ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ‰ํƒ„ํ™”(flatten)ํ•˜๊ณ , ๊ฐ ์˜ˆ์ œ์— ํ•ด๋‹นํ•˜๋Š” input_ids, attention_mask ๋ฐ labels ํ•„๋“œ๋ฅผ ๊ฐ–๋„๋ก ๋‹ค์ฐจ์›ํ™”(unflatten) ํ•ฉ๋‹ˆ๋‹ค.
>>> ending_names = ["ending0", "ending1", "ending2", "ending3"]


>>> def preprocess_function(examples):
...     first_sentences = [[context] * 4 for context in examples["sent1"]]
...     question_headers = examples["sent2"]
...     second_sentences = [
...         [f"{header} {examples[end][i]}" for end in ending_names] for i, header in enumerate(question_headers)
...     ]

...     first_sentences = sum(first_sentences, [])
...     second_sentences = sum(second_sentences, [])

...     tokenized_examples = tokenizer(first_sentences, second_sentences, truncation=True)
...     return {k: [v[i : i + 4] for i in range(0, len(v), 4)] for k, v in tokenized_examples.items()}

์ „์ฒด ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์— ์ „์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ์„ ์ ์šฉํ•˜๋ ค๋ฉด ๐Ÿค— Datasets [~datasets.Dataset.map] ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. batched=True๋ฅผ ์„ค์ •ํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์˜ ์—ฌ๋Ÿฌ ์š”์†Œ๋ฅผ ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๋ฉด map ํ•จ์ˆ˜์˜ ์†๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

tokenized_swag = swag.map(preprocess_function, batched=True)

๐Ÿค— Transformers์—๋Š” ๊ฐ๊ด€์‹์šฉ ๋ฐ์ดํ„ฐ ์ฝœ๋ ˆ์ดํ„ฐ๊ฐ€ ์—†์œผ๋ฏ€๋กœ ์˜ˆ์ œ ๋ฐฐ์น˜๋ฅผ ๋งŒ๋“ค๋ ค๋ฉด [DataCollatorWithPadding]์„ ์กฐ์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์ •๋ ฌ ์ค‘์— ์ „์ฒด ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ ์ตœ๋Œ€ ๊ธธ์ด๋กœ ํŒจ๋”ฉํ•˜๋Š” ๋Œ€์‹  ๋ฐฐ์น˜ ์ค‘ ๊ฐ€์žฅ ๊ธด ๊ธธ์ด๋กœ ๋ฌธ์žฅ์„ ๋™์  ํŒจ๋”ฉํ•˜๋Š” ๊ฒƒ์ด ๋” ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.

DataCollatorForMultipleChoice๋Š” ๋ชจ๋“  ๋ชจ๋ธ ์ž…๋ ฅ์„ ํ‰ํƒ„ํ™”ํ•˜๊ณ  ํŒจ๋”ฉ์„ ์ ์šฉํ•˜๋ฉฐ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐ๊ณผ๋ฅผ ๋‹ค์ฐจ์›ํ™”ํ•ฉ๋‹ˆ๋‹ค:

```py >>> from dataclasses import dataclass >>> from transformers.tokenization_utils_base import PreTrainedTokenizerBase, PaddingStrategy >>> from typing import Optional, Union >>> import torch

@dataclass ... class DataCollatorForMultipleChoice: ... """ ... Data collator that will dynamically pad the inputs for multiple choice received. ... """

... tokenizer: PreTrainedTokenizerBase ... padding: Union[bool, str, PaddingStrategy] = True ... max_length: Optional[int] = None ... pad_to_multiple_of: Optional[int] = None

... def call(self, features): ... label_name = "label" if "label" in features[0].keys() else "labels" ... labels = [feature.pop(label_name) for feature in features] ... batch_size = len(features) ... num_choices = len(features[0]["input_ids"]) ... flattened_features = [ ... [{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features ... ] ... flattened_features = sum(flattened_features, [])

... batch = self.tokenizer.pad( ... flattened_features, ... padding=self.padding, ... max_length=self.max_length, ... pad_to_multiple_of=self.pad_to_multiple_of, ... return_tensors="pt", ... )

... batch = {k: v.view(batch_size, num_choices, -1) for k, v in batch.items()} ... batch["labels"] = torch.tensor(labels, dtype=torch.int64) ... return batch

</pt>
<tf>
```py
>>> from dataclasses import dataclass
>>> from transformers.tokenization_utils_base import PreTrainedTokenizerBase, PaddingStrategy
>>> from typing import Optional, Union
>>> import tensorflow as tf


>>> @dataclass
... class DataCollatorForMultipleChoice:
...     """
...     Data collator that will dynamically pad the inputs for multiple choice received.
...     """

...     tokenizer: PreTrainedTokenizerBase
...     padding: Union[bool, str, PaddingStrategy] = True
...     max_length: Optional[int] = None
...     pad_to_multiple_of: Optional[int] = None

...     def __call__(self, features):
...         label_name = "label" if "label" in features[0].keys() else "labels"
...         labels = [feature.pop(label_name) for feature in features]
...         batch_size = len(features)
...         num_choices = len(features[0]["input_ids"])
...         flattened_features = [
...             [{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features
...         ]
...         flattened_features = sum(flattened_features, [])

...         batch = self.tokenizer.pad(
...             flattened_features,
...             padding=self.padding,
...             max_length=self.max_length,
...             pad_to_multiple_of=self.pad_to_multiple_of,
...             return_tensors="tf",
...         )

...         batch = {k: tf.reshape(v, (batch_size, num_choices, -1)) for k, v in batch.items()}
...         batch["labels"] = tf.convert_to_tensor(labels, dtype=tf.int64)
...         return batch

ํ‰๊ฐ€ ํ•˜๊ธฐ[[evaluate]]

ํ›ˆ๋ จ ์ค‘์— ๋ฉ”ํŠธ๋ฆญ์„ ํฌํ•จํ•˜๋ฉด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ๐Ÿค—Evaluate ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์„ ๋น ๋ฅด๊ฒŒ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ž‘์—…์—์„œ๋Š” accuracy ์ง€ํ‘œ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค(๐Ÿค— Evaluate ๋‘˜๋Ÿฌ๋ณด๊ธฐ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ์ง€ํ‘œ๋ฅผ ๊ฐ€์ ธ์˜ค๊ณ  ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด์„ธ์š”):

>>> import evaluate

>>> accuracy = evaluate.load("accuracy")

๊ทธ๋ฆฌ๊ณ  ์˜ˆ์ธก๊ณผ ๋ ˆ์ด๋ธ”์„ [~evaluate.EvaluationModule.compute]์— ์ „๋‹ฌํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค:

>>> import numpy as np


>>> def compute_metrics(eval_pred):
...     predictions, labels = eval_pred
...     predictions = np.argmax(predictions, axis=1)
...     return accuracy.compute(predictions=predictions, references=labels)

์ด์ œ compute_metrics ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์œผ๋ฉฐ, ํ›ˆ๋ จ์„ ์„ค์ •ํ•  ๋•Œ ์ด ํ•จ์ˆ˜๋กœ ๋Œ์•„๊ฐ€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

ํ›ˆ๋ จ ํ•˜๊ธฐ[[train]]

[Trainer]๋กœ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐ ์ต์ˆ™ํ•˜์ง€ ์•Š๋‹ค๋ฉด ๊ธฐ๋ณธ ํŠœํ† ๋ฆฌ์–ผ ์—ฌ๊ธฐ๋ฅผ ์‚ดํŽด๋ณด์„ธ์š”!

์ด์ œ ๋ชจ๋ธ ํ›ˆ๋ จ์„ ์‹œ์ž‘ํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค! [AutoModelForMultipleChoice]๋กœ BERT๋ฅผ ๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:

>>> from transformers import AutoModelForMultipleChoice, TrainingArguments, Trainer

>>> model = AutoModelForMultipleChoice.from_pretrained("bert-base-uncased")

์ด์ œ ์„ธ ๋‹จ๊ณ„๋งŒ ๋‚จ์•˜์Šต๋‹ˆ๋‹ค:

  1. ํ›ˆ๋ จ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ [TrainingArguments]์— ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. ์œ ์ผํ•œ ํ•„์ˆ˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๋ชจ๋ธ์„ ์ €์žฅํ•  ์œ„์น˜๋ฅผ ์ง€์ •ํ•˜๋Š” output_dir์ž…๋‹ˆ๋‹ค. push_to_hub=True๋ฅผ ์„ค์ •ํ•˜์—ฌ ์ด ๋ชจ๋ธ์„ ํ—ˆ๋ธŒ์— ํ‘ธ์‹œํ•ฉ๋‹ˆ๋‹ค(๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•˜๋ ค๋ฉด ํ—ˆ๊น… ํŽ˜์ด์Šค์— ๋กœ๊ทธ์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค). ๊ฐ ์—ํญ์ด ๋๋‚  ๋•Œ๋งˆ๋‹ค [Trainer]๊ฐ€ ์ •ํ™•๋„๋ฅผ ํ‰๊ฐ€ํ•˜๊ณ  ํ›ˆ๋ จ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
  2. ๋ชจ๋ธ, ๋ฐ์ดํ„ฐ ์„ธํŠธ, ํ† ํฌ๋‚˜์ด์ €, ๋ฐ์ดํ„ฐ ์ฝœ๋ ˆ์ดํ„ฐ, compute_metrics ํ•จ์ˆ˜์™€ ํ•จ๊ป˜ ํ›ˆ๋ จ ์ธ์ž๋ฅผ [Trainer]์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
  3. [~Trainer.train]์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
>>> training_args = TrainingArguments(
...     output_dir="my_awesome_swag_model",
...     evaluation_strategy="epoch",
...     save_strategy="epoch",
...     load_best_model_at_end=True,
...     learning_rate=5e-5,
...     per_device_train_batch_size=16,
...     per_device_eval_batch_size=16,
...     num_train_epochs=3,
...     weight_decay=0.01,
...     push_to_hub=True,
... )

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     train_dataset=tokenized_swag["train"],
...     eval_dataset=tokenized_swag["validation"],
...     tokenizer=tokenizer,
...     data_collator=DataCollatorForMultipleChoice(tokenizer=tokenizer),
...     compute_metrics=compute_metrics,
... )

>>> trainer.train()

ํ›ˆ๋ จ์ด ์™„๋ฃŒ๋˜๋ฉด ๋ชจ๋“  ์‚ฌ๋žŒ์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก [~transformers.Trainer.push_to_hub] ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ—ˆ๋ธŒ์— ๊ณต์œ ํ•˜์„ธ์š”:

>>> trainer.push_to_hub()

Keras๋กœ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐ ์ต์ˆ™ํ•˜์ง€ ์•Š๋‹ค๋ฉด ๊ธฐ๋ณธ ํŠœํ† ๋ฆฌ์–ผ ์—ฌ๊ธฐ๋ฅผ ์‚ดํŽด๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค!

TensorFlow์—์„œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋ ค๋ฉด ์ตœ์ ํ™” ํ•จ์ˆ˜, ํ•™์Šต๋ฅ  ์Šค์ผ€์ฅด ๋ฐ ๋ช‡ ๊ฐ€์ง€ ํ•™์Šต ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•˜๋Š” ๊ฒƒ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์„ธ์š”:
>>> from transformers import create_optimizer

>>> batch_size = 16
>>> num_train_epochs = 2
>>> total_train_steps = (len(tokenized_swag["train"]) // batch_size) * num_train_epochs
>>> optimizer, schedule = create_optimizer(init_lr=5e-5, num_warmup_steps=0, num_train_steps=total_train_steps)

๊ทธ๋ฆฌ๊ณ  [TFAutoModelForMultipleChoice]๋กœ BERT๋ฅผ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

>>> from transformers import TFAutoModelForMultipleChoice

>>> model = TFAutoModelForMultipleChoice.from_pretrained("bert-base-uncased")

[~transformers.TFPreTrainedModel.prepare_tf_dataset]์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ tf.data.Dataset ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

>>> data_collator = DataCollatorForMultipleChoice(tokenizer=tokenizer)
>>> tf_train_set = model.prepare_tf_dataset(
...     tokenized_swag["train"],
...     shuffle=True,
...     batch_size=batch_size,
...     collate_fn=data_collator,
... )

>>> tf_validation_set = model.prepare_tf_dataset(
...     tokenized_swag["validation"],
...     shuffle=False,
...     batch_size=batch_size,
...     collate_fn=data_collator,
... )

compile์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค:

>>> model.compile(optimizer=optimizer)

ํ›ˆ๋ จ์„ ์‹œ์ž‘ํ•˜๊ธฐ ์ „์— ์„ค์ •ํ•ด์•ผ ํ•  ๋งˆ์ง€๋ง‰ ๋‘ ๊ฐ€์ง€๋Š” ์˜ˆ์ธก์˜ ์ •ํ™•๋„๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ๋ชจ๋ธ์„ ํ—ˆ๋ธŒ๋กœ ํ‘ธ์‹œํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด ๋‘ ๊ฐ€์ง€ ์ž‘์—…์€ ๋ชจ๋‘ Keras ์ฝœ๋ฐฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

compute_metricsํ•จ์ˆ˜๋ฅผ [~transformers.KerasMetricCallback]์— ์ „๋‹ฌํ•˜์„ธ์š”:

>>> from transformers.keras_callbacks import KerasMetricCallback

>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)

๋ชจ๋ธ๊ณผ ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์—…๋กœ๋“œํ•  ์œ„์น˜๋ฅผ [~transformers.PushToHubCallback]์—์„œ ์ง€์ •ํ•˜์„ธ์š”:

>>> from transformers.keras_callbacks import PushToHubCallback

>>> push_to_hub_callback = PushToHubCallback(
...     output_dir="my_awesome_model",
...     tokenizer=tokenizer,
... )

๊ทธ๋ฆฌ๊ณ  ์ฝœ๋ฐฑ์„ ํ•จ๊ป˜ ๋ฌถ์Šต๋‹ˆ๋‹ค:

>>> callbacks = [metric_callback, push_to_hub_callback]

์ด์ œ ๋ชจ๋ธ ํ›ˆ๋ จ์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค! ํ›ˆ๋ จ ๋ฐ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ ์„ธํŠธ, ์—ํญ ์ˆ˜, ์ฝœ๋ฐฑ์„ ์‚ฌ์šฉํ•˜์—ฌ fit์„ ํ˜ธ์ถœํ•˜๊ณ  ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค:

>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=2, callbacks=callbacks)

ํ›ˆ๋ จ์ด ์™„๋ฃŒ๋˜๋ฉด ๋ชจ๋ธ์ด ์ž๋™์œผ๋กœ ํ—ˆ๋ธŒ์— ์—…๋กœ๋“œ๋˜์–ด ๋ˆ„๊ตฌ๋‚˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

๊ฐ๊ด€์‹ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋ณด๋‹ค ์‹ฌ์ธต์ ์ธ ์˜ˆ๋Š” ์•„๋ž˜ ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”. PyTorch notebook ๋˜๋Š” TensorFlow notebook.

์ถ”๋ก  ํ•˜๊ธฐ[[inference]]

์ด์ œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ–ˆ์œผ๋‹ˆ ์ถ”๋ก ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

ํ…์ŠคํŠธ์™€ ๋‘ ๊ฐœ์˜ ํ›„๋ณด ๋‹ต์•ˆ์„ ์ž‘์„ฑํ•ฉ๋‹ˆ๋‹ค:

>>> prompt = "France has a bread law, Le Dรฉcret Pain, with strict rules on what is allowed in a traditional baguette."
>>> candidate1 = "The law does not apply to croissants and brioche."
>>> candidate2 = "The law applies to baguettes."
๊ฐ ํ”„๋กฌํ”„ํŠธ์™€ ํ›„๋ณด ๋‹ต๋ณ€ ์Œ์„ ํ† ํฐํ™”ํ•˜์—ฌ PyTorch ํ…์„œ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ `labels`์„ ์ƒ์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:
>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_swag_model")
>>> inputs = tokenizer([[prompt, candidate1], [prompt, candidate2]], return_tensors="pt", padding=True)
>>> labels = torch.tensor(0).unsqueeze(0)

์ž…๋ ฅ๊ณผ ๋ ˆ์ด๋ธ”์„ ๋ชจ๋ธ์— ์ „๋‹ฌํ•˜๊ณ  logits์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

>>> from transformers import AutoModelForMultipleChoice

>>> model = AutoModelForMultipleChoice.from_pretrained("my_awesome_swag_model")
>>> outputs = model(**{k: v.unsqueeze(0) for k, v in inputs.items()}, labels=labels)
>>> logits = outputs.logits

๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ํด๋ž˜์Šค๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค:

>>> predicted_class = logits.argmax().item()
>>> predicted_class
'0'
๊ฐ ํ”„๋กฌํ”„ํŠธ์™€ ํ›„๋ณด ๋‹ต์•ˆ ์Œ์„ ํ† ํฐํ™”ํ•˜์—ฌ ํ…์„œํ”Œ๋กœ ํ…์„œ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:
>>> from transformers import AutoTokenizer

>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_swag_model")
>>> inputs = tokenizer([[prompt, candidate1], [prompt, candidate2]], return_tensors="tf", padding=True)

๋ชจ๋ธ์— ์ž…๋ ฅ์„ ์ „๋‹ฌํ•˜๊ณ  logits๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

>>> from transformers import TFAutoModelForMultipleChoice

>>> model = TFAutoModelForMultipleChoice.from_pretrained("my_awesome_swag_model")
>>> inputs = {k: tf.expand_dims(v, 0) for k, v in inputs.items()}
>>> outputs = model(inputs)
>>> logits = outputs.logits

๊ฐ€์žฅ ๋†’์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ ํด๋ž˜์Šค๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค:

>>> predicted_class = int(tf.math.argmax(logits, axis=-1)[0])
>>> predicted_class
'0'