MASR / transformers /docs /source /ko /tasks /image_classification.md
Yuvarraj's picture
Initial commit
a0db2f9

์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜[[image-classification]]

[[open-in-colab]]

์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋Š” ์ด๋ฏธ์ง€์— ๋ ˆ์ด๋ธ” ๋˜๋Š” ํด๋ž˜์Šค๋ฅผ ํ• ๋‹นํ•ฉ๋‹ˆ๋‹ค. ํ…์ŠคํŠธ ๋˜๋Š” ์˜ค๋””์˜ค ๋ถ„๋ฅ˜์™€ ๋‹ฌ๋ฆฌ ์ž…๋ ฅ์€ ์ด๋ฏธ์ง€๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ํ”ฝ์…€ ๊ฐ’์ž…๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜์—๋Š” ์ž์—ฐ์žฌํ•ด ํ›„ ํ”ผํ•ด ๊ฐ์ง€, ๋†์ž‘๋ฌผ ๊ฑด๊ฐ• ๋ชจ๋‹ˆํ„ฐ๋ง, ์˜๋ฃŒ ์ด๋ฏธ์ง€์—์„œ ์งˆ๋ณ‘์˜ ์ง•ํ›„ ๊ฒ€์‚ฌ ์ง€์› ๋“ฑ ๋‹ค์–‘ํ•œ ์‘์šฉ ์‚ฌ๋ก€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” ๋‹ค์Œ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค:

  1. Food-101 ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ViT๋ฅผ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ์ด๋ฏธ์ง€์—์„œ ์‹ํ’ˆ ํ•ญ๋ชฉ์„ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.
  2. ์ถ”๋ก ์„ ์œ„ํ•ด ๋ฏธ์„ธ ์กฐ์ • ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
์ด ํŠœํ† ๋ฆฌ์–ผ์—์„œ ์„ค๋ช…ํ•˜๋Š” ์ž‘์—…์€ ๋‹ค์Œ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์— ์˜ํ•ด ์ง€์›๋ฉ๋‹ˆ๋‹ค:

BEiT, BiT, ConvNeXT, ConvNeXTV2, CvT, Data2VecVision, DeiT, DiNAT, EfficientFormer, EfficientNet, FocalNet, ImageGPT, LeViT, MobileNetV1, MobileNetV2, MobileViT, NAT, Perceiver, PoolFormer, RegNet, ResNet, SegFormer, Swin Transformer, Swin Transformer V2, VAN, ViT, ViT Hybrid, ViTMSN

์‹œ์ž‘ํ•˜๊ธฐ ์ „์—, ํ•„์š”ํ•œ ๋ชจ๋“  ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:

pip install transformers datasets evaluate

Hugging Face ๊ณ„์ •์— ๋กœ๊ทธ์ธํ•˜์—ฌ ๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•˜๊ณ  ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๊ณต์œ ํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. ๋ฉ”์‹œ์ง€๊ฐ€ ํ‘œ์‹œ๋˜๋ฉด, ํ† ํฐ์„ ์ž…๋ ฅํ•˜์—ฌ ๋กœ๊ทธ์ธํ•˜์„ธ์š”:

>>> from huggingface_hub import notebook_login

>>> notebook_login()

Food-101 ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ฐ€์ ธ์˜ค๊ธฐ[[load-food101-dataset]]

๐Ÿค— Datasets ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์—์„œ Food-101 ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๋” ์ž‘์€ ๋ถ€๋ถ„ ์ง‘ํ•ฉ์„ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ์œผ๋กœ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•˜๋ฉด ์ „์ฒด ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ํ›ˆ๋ จ์— ๋งŽ์€ ์‹œ๊ฐ„์„ ํ• ์• ํ•˜๊ธฐ ์ „์— ์‹คํ—˜์„ ํ†ตํ•ด ๋ชจ๋“  ๊ฒƒ์ด ์ œ๋Œ€๋กœ ์ž‘๋™ํ•˜๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

>>> from datasets import load_dataset

>>> food = load_dataset("food101", split="train[:5000]")

๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ train์„ [~datasets.Dataset.train_test_split] ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ๋ฐ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋ถ„ํ• ํ•˜์„ธ์š”:

>>> food = food.train_test_split(test_size=0.2)

๊ทธ๋ฆฌ๊ณ  ์˜ˆ์‹œ๋ฅผ ์‚ดํŽด๋ณด์„ธ์š”:

>>> food["train"][0]
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=512x512 at 0x7F52AFC8AC50>,
 'label': 79}

๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๊ฐ ์˜ˆ์ œ์—๋Š” ๋‘ ๊ฐœ์˜ ํ•„๋“œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค:

  • image: ์‹ํ’ˆ ํ•ญ๋ชฉ์˜ PIL ์ด๋ฏธ์ง€
  • label: ์‹ํ’ˆ ํ•ญ๋ชฉ์˜ ๋ ˆ์ด๋ธ” ํด๋ž˜์Šค

๋ชจ๋ธ์ด ๋ ˆ์ด๋ธ” ID์—์„œ ๋ ˆ์ด๋ธ” ์ด๋ฆ„์„ ์‰ฝ๊ฒŒ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋„๋ก ๋ ˆ์ด๋ธ” ์ด๋ฆ„์„ ์ •์ˆ˜๋กœ ๋งคํ•‘ํ•˜๊ณ , ์ •์ˆ˜๋ฅผ ๋ ˆ์ด๋ธ” ์ด๋ฆ„์œผ๋กœ ๋งคํ•‘ํ•˜๋Š” ์‚ฌ์ „์„ ๋งŒ๋“œ์„ธ์š”:

>>> labels = food["train"].features["label"].names
>>> label2id, id2label = dict(), dict()
>>> for i, label in enumerate(labels):
...     label2id[label] = str(i)
...     id2label[str(i)] = label

์ด์ œ ๋ ˆ์ด๋ธ” ID๋ฅผ ๋ ˆ์ด๋ธ” ์ด๋ฆ„์œผ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

>>> id2label[str(79)]
'prime_rib'

์ „์ฒ˜๋ฆฌ[[preprocess]]

๋‹ค์Œ ๋‹จ๊ณ„๋Š” ์ด๋ฏธ์ง€๋ฅผ ํ…์„œ๋กœ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ViT ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค:

>>> from transformers import AutoImageProcessor

>>> checkpoint = "google/vit-base-patch16-224-in21k"
>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint)
์ด๋ฏธ์ง€์— ๋ช‡ ๊ฐ€์ง€ ์ด๋ฏธ์ง€ ๋ณ€ํ™˜์„ ์ ์šฉํ•˜์—ฌ ๊ณผ์ ํ•ฉ์— ๋Œ€ํ•ด ๋ชจ๋ธ์„ ๋” ๊ฒฌ๊ณ ํ•˜๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ Torchvision์˜ [`transforms`](https://pytorch.org/vision/stable/transforms.html) ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์ง€๋งŒ, ์›ํ•˜๋Š” ์ด๋ฏธ์ง€ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋ฏธ์ง€์˜ ์ž„์˜ ๋ถ€๋ถ„์„ ํฌ๋กญํ•˜๊ณ  ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•œ ๋‹ค์Œ, ์ด๋ฏธ์ง€ ํ‰๊ท ๊ณผ ํ‘œ์ค€ ํŽธ์ฐจ๋กœ ์ •๊ทœํ™”ํ•˜์„ธ์š”:

>>> from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor

>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std)
>>> size = (
...     image_processor.size["shortest_edge"]
...     if "shortest_edge" in image_processor.size
...     else (image_processor.size["height"], image_processor.size["width"])
... )
>>> _transforms = Compose([RandomResizedCrop(size), ToTensor(), normalize])

๊ทธ๋Ÿฐ ๋‹ค์Œ ์ „์ฒ˜๋ฆฌ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด ๋ณ€ํ™˜์„ ์ ์šฉํ•˜๊ณ  ์ด๋ฏธ์ง€์˜ pixel_values(๋ชจ๋ธ์— ๋Œ€ํ•œ ์ž…๋ ฅ)๋ฅผ ๋ฐ˜ํ™˜ํ•˜์„ธ์š”:

>>> def transforms(examples):
...     examples["pixel_values"] = [_transforms(img.convert("RGB")) for img in examples["image"]]
...     del examples["image"]
...     return examples

์ „์ฒด ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ์ „์ฒ˜๋ฆฌ ๊ธฐ๋Šฅ์„ ์ ์šฉํ•˜๋ ค๋ฉด ๐Ÿค— Datasets [~datasets.Dataset.with_transform]์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ์š”์†Œ๋ฅผ ๊ฐ€์ ธ์˜ฌ ๋•Œ ๋ณ€ํ™˜์ด ์ฆ‰์‹œ ์ ์šฉ๋ฉ๋‹ˆ๋‹ค:

>>> food = food.with_transform(transforms)

์ด์ œ [DefaultDataCollator]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ œ ๋ฐฐ์น˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๐Ÿค— Transformers์˜ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ์ฝœ๋ ˆ์ดํ„ฐ์™€ ๋‹ฌ๋ฆฌ, DefaultDataCollator๋Š” ํŒจ๋”ฉ๊ณผ ๊ฐ™์€ ์ถ”๊ฐ€์ ์ธ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

>>> from transformers import DefaultDataCollator

>>> data_collator = DefaultDataCollator()

๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ณ  ๋ชจ๋ธ์„ ๋ณด๋‹ค ๊ฒฌ๊ณ ํ•˜๊ฒŒ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ํ›ˆ๋ จ ๋ถ€๋ถ„์— ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ Keras ์ „์ฒ˜๋ฆฌ ๋ ˆ์ด์–ด๋กœ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ณ€ํ™˜(๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ํฌํ•จ)๊ณผ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ณ€ํ™˜(์ค‘์•™ ํฌ๋กœํ•‘, ํฌ๊ธฐ ์กฐ์ •, ์ •๊ทœํ™”๋งŒ)์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค. tf.image ๋˜๋Š” ๋‹ค๋ฅธ ์›ํ•˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

>>> from tensorflow import keras
>>> from tensorflow.keras import layers

>>> size = (image_processor.size["height"], image_processor.size["width"])

>>> train_data_augmentation = keras.Sequential(
...     [
...         layers.RandomCrop(size[0], size[1]),
...         layers.Rescaling(scale=1.0 / 127.5, offset=-1),
...         layers.RandomFlip("horizontal"),
...         layers.RandomRotation(factor=0.02),
...         layers.RandomZoom(height_factor=0.2, width_factor=0.2),
...     ],
...     name="train_data_augmentation",
... )

>>> val_data_augmentation = keras.Sequential(
...     [
...         layers.CenterCrop(size[0], size[1]),
...         layers.Rescaling(scale=1.0 / 127.5, offset=-1),
...     ],
...     name="val_data_augmentation",
... )

๋‹ค์Œ์œผ๋กœ ํ•œ ๋ฒˆ์— ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๊ฐ€ ์•„๋‹ˆ๋ผ ์ด๋ฏธ์ง€ ๋ฐฐ์น˜์— ์ ์ ˆํ•œ ๋ณ€ํ™˜์„ ์ ์šฉํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

>>> import numpy as np
>>> import tensorflow as tf
>>> from PIL import Image


>>> def convert_to_tf_tensor(image: Image):
...     np_image = np.array(image)
...     tf_image = tf.convert_to_tensor(np_image)
...     # `expand_dims()` is used to add a batch dimension since
...     # the TF augmentation layers operates on batched inputs.
...     return tf.expand_dims(tf_image, 0)


>>> def preprocess_train(example_batch):
...     """Apply train_transforms across a batch."""
...     images = [
...         train_data_augmentation(convert_to_tf_tensor(image.convert("RGB"))) for image in example_batch["image"]
...     ]
...     example_batch["pixel_values"] = [tf.transpose(tf.squeeze(image)) for image in images]
...     return example_batch


... def preprocess_val(example_batch):
...     """Apply val_transforms across a batch."""
...     images = [
...         val_data_augmentation(convert_to_tf_tensor(image.convert("RGB"))) for image in example_batch["image"]
...     ]
...     example_batch["pixel_values"] = [tf.transpose(tf.squeeze(image)) for image in images]
...     return example_batch

๐Ÿค— Datasets [~datasets.Dataset.set_transform]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฆ‰์‹œ ๋ณ€ํ™˜์„ ์ ์šฉํ•˜์„ธ์š”:

food["train"].set_transform(preprocess_train)
food["test"].set_transform(preprocess_val)

์ตœ์ข… ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋กœ DefaultDataCollator๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜ˆ์ œ ๋ฐฐ์น˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๐Ÿค— Transformers์˜ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ ์ฝœ๋ ˆ์ดํ„ฐ์™€ ๋‹ฌ๋ฆฌ DefaultDataCollator๋Š” ํŒจ๋”ฉ๊ณผ ๊ฐ™์€ ์ถ”๊ฐ€ ์ „์ฒ˜๋ฆฌ๋ฅผ ์ ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

>>> from transformers import DefaultDataCollator

>>> data_collator = DefaultDataCollator(return_tensors="tf")

ํ‰๊ฐ€[[evaluate]]

ํ›ˆ๋ จ ์ค‘์— ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ํฌํ•จํ•˜๋ฉด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ๐Ÿค— Evaluate ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์„ ๋น ๋ฅด๊ฒŒ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์ž‘์—…์—์„œ๋Š” accuracy ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. (๐Ÿค— Evaluate ๋น ๋ฅธ ๋‘˜๋Ÿฌ๋ณด๊ธฐ๋ฅผ ์ฐธ์กฐํ•˜์—ฌ ํ‰๊ฐ€ ์ง€ํ‘œ๋ฅผ ๊ฐ€์ ธ์˜ค๊ณ  ๊ณ„์‚ฐํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์ž์„ธํžˆ ์•Œ์•„๋ณด์„ธ์š”):

>>> import evaluate

>>> accuracy = evaluate.load("accuracy")

๊ทธ๋Ÿฐ ๋‹ค์Œ ์˜ˆ์ธก๊ณผ ๋ ˆ์ด๋ธ”์„ [~evaluate.EvaluationModule.compute]์— ์ „๋‹ฌํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค:

>>> import numpy as np


>>> def compute_metrics(eval_pred):
...     predictions, labels = eval_pred
...     predictions = np.argmax(predictions, axis=1)
...     return accuracy.compute(predictions=predictions, references=labels)

์ด์ œ compute_metrics ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์œผ๋ฉฐ, ํ›ˆ๋ จ์„ ์„ค์ •ํ•˜๋ฉด ์ด ํ•จ์ˆ˜๋กœ ๋˜๋Œ์•„์˜ฌ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

ํ›ˆ๋ จ[[train]]

[Trainer]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ์ต์ˆ™ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ, ์—ฌ๊ธฐ์—์„œ ๊ธฐ๋ณธ ํŠœํ† ๋ฆฌ์–ผ์„ ํ™•์ธํ•˜์„ธ์š”!

์ด์ œ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ฌ ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค! [AutoModelForImageClassification]๋กœ ViT๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. ์˜ˆ์ƒ๋˜๋Š” ๋ ˆ์ด๋ธ” ์ˆ˜, ๋ ˆ์ด๋ธ” ๋งคํ•‘ ๋ฐ ๋ ˆ์ด๋ธ” ์ˆ˜๋ฅผ ์ง€์ •ํ•˜์„ธ์š”:

>>> from transformers import AutoModelForImageClassification, TrainingArguments, Trainer

>>> model = AutoModelForImageClassification.from_pretrained(
...     checkpoint,
...     num_labels=len(labels),
...     id2label=id2label,
...     label2id=label2id,
... )

์ด์ œ ์„ธ ๋‹จ๊ณ„๋งŒ ๊ฑฐ์น˜๋ฉด ๋์ž…๋‹ˆ๋‹ค:

  1. [TrainingArguments]์—์„œ ํ›ˆ๋ จ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ •์˜ํ•˜์„ธ์š”. image ์—ด์ด ์‚ญ์ œ๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฏธ์‚ฌ์šฉ ์—ด์„ ์ œ๊ฑฐํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. image ์—ด์ด ์—†์œผ๋ฉด pixel_values์„ ์ƒ์„ฑํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ด ๋™์ž‘์„ ๋ฐฉ์ง€ํ•˜๋ ค๋ฉด remove_unused_columns=False๋กœ ์„ค์ •ํ•˜์„ธ์š”! ๋‹ค๋ฅธ ์œ ์ผํ•œ ํ•„์ˆ˜ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๋ชจ๋ธ ์ €์žฅ ์œ„์น˜๋ฅผ ์ง€์ •ํ•˜๋Š” output_dir์ž…๋‹ˆ๋‹ค. push_to_hub=True๋กœ ์„ค์ •ํ•˜๋ฉด ์ด ๋ชจ๋ธ์„ ํ—ˆ๋ธŒ์— ํ‘ธ์‹œํ•ฉ๋‹ˆ๋‹ค(๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•˜๋ ค๋ฉด Hugging Face์— ๋กœ๊ทธ์ธํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค). ๊ฐ ์—ํญ์ด ๋๋‚  ๋•Œ๋งˆ๋‹ค, [Trainer]๊ฐ€ ์ •ํ™•๋„๋ฅผ ํ‰๊ฐ€ํ•˜๊ณ  ํ›ˆ๋ จ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
  2. [Trainer]์— ๋ชจ๋ธ, ๋ฐ์ดํ„ฐ ์„ธํŠธ, ํ† ํฌ๋‚˜์ด์ €, ๋ฐ์ดํ„ฐ ์ฝœ๋ ˆ์ดํ„ฐ ๋ฐ compute_metrics ํ•จ์ˆ˜์™€ ํ•จ๊ป˜ ํ›ˆ๋ จ ์ธ์ˆ˜๋ฅผ ์ „๋‹ฌํ•˜์„ธ์š”.
  3. [~Trainer.train]์„ ํ˜ธ์ถœํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์„ธ์š”.
>>> training_args = TrainingArguments(
...     output_dir="my_awesome_food_model",
...     remove_unused_columns=False,
...     evaluation_strategy="epoch",
...     save_strategy="epoch",
...     learning_rate=5e-5,
...     per_device_train_batch_size=16,
...     gradient_accumulation_steps=4,
...     per_device_eval_batch_size=16,
...     num_train_epochs=3,
...     warmup_ratio=0.1,
...     logging_steps=10,
...     load_best_model_at_end=True,
...     metric_for_best_model="accuracy",
...     push_to_hub=True,
... )

>>> trainer = Trainer(
...     model=model,
...     args=training_args,
...     data_collator=data_collator,
...     train_dataset=food["train"],
...     eval_dataset=food["test"],
...     tokenizer=image_processor,
...     compute_metrics=compute_metrics,
... )

>>> trainer.train()

ํ›ˆ๋ จ์ด ์™„๋ฃŒ๋˜๋ฉด, ๋ชจ๋“  ์‚ฌ๋žŒ์ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก [~transformers.Trainer.push_to_hub] ๋ฉ”์†Œ๋“œ๋กœ ๋ชจ๋ธ์„ ํ—ˆ๋ธŒ์— ๊ณต์œ ํ•˜์„ธ์š”:

>>> trainer.push_to_hub()

Keras๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ์ต์ˆ™ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ, ๋จผ์ € ๊ธฐ๋ณธ ํŠœํ† ๋ฆฌ์–ผ์„ ํ™•์ธํ•˜์„ธ์š”!

TensorFlow์—์„œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋ ค๋ฉด ๋‹ค์Œ ๋‹จ๊ณ„๋ฅผ ๋”ฐ๋ฅด์„ธ์š”:

  1. ํ›ˆ๋ จ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ •์˜ํ•˜๊ณ  ์˜ตํ‹ฐ๋งˆ์ด์ €์™€ ํ•™์Šต๋ฅ  ์Šค์ผ€์ฅด์„ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
  2. ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ์ธ์Šคํ„ด์Šคํ™”ํ•ฉ๋‹ˆ๋‹ค.
  3. ๐Ÿค— Dataset์„ tf.data.Dataset์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  4. ๋ชจ๋ธ์„ ์ปดํŒŒ์ผํ•ฉ๋‹ˆ๋‹ค.
  5. ์ฝœ๋ฐฑ์„ ์ถ”๊ฐ€ํ•˜๊ณ  ํ›ˆ๋ จ์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด fit() ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  6. ์ปค๋ฎค๋‹ˆํ‹ฐ์™€ ๊ณต์œ ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ์„ ๐Ÿค— Hub์— ์—…๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค.

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ, ์˜ตํ‹ฐ๋งˆ์ด์ € ๋ฐ ํ•™์Šต๋ฅ  ์Šค์ผ€์ฅด์„ ์ •์˜ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค:

>>> from transformers import create_optimizer

>>> batch_size = 16
>>> num_epochs = 5
>>> num_train_steps = len(food["train"]) * num_epochs
>>> learning_rate = 3e-5
>>> weight_decay_rate = 0.01

>>> optimizer, lr_schedule = create_optimizer(
...     init_lr=learning_rate,
...     num_train_steps=num_train_steps,
...     weight_decay_rate=weight_decay_rate,
...     num_warmup_steps=0,
... )

๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ ˆ์ด๋ธ” ๋งคํ•‘๊ณผ ํ•จ๊ป˜ [TFAuto ModelForImageClassification]์œผ๋กœ ViT๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค:

>>> from transformers import TFAutoModelForImageClassification

>>> model = TFAutoModelForImageClassification.from_pretrained(
...     checkpoint,
...     id2label=id2label,
...     label2id=label2id,
... )

๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ [~datasets.Dataset.to_tf_dataset]์™€ data_collator๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ tf.data.Dataset ํ˜•์‹์œผ๋กœ ๋ณ€ํ™˜ํ•˜์„ธ์š”:

>>> # converting our train dataset to tf.data.Dataset
>>> tf_train_dataset = food["train"].to_tf_dataset(
...     columns="pixel_values", label_cols="label", shuffle=True, batch_size=batch_size, collate_fn=data_collator
... )

>>> # converting our test dataset to tf.data.Dataset
>>> tf_eval_dataset = food["test"].to_tf_dataset(
...     columns="pixel_values", label_cols="label", shuffle=True, batch_size=batch_size, collate_fn=data_collator
... )

compile()๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ๋ชจ๋ธ์„ ๊ตฌ์„ฑํ•˜์„ธ์š”:

>>> from tensorflow.keras.losses import SparseCategoricalCrossentropy

>>> loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
>>> model.compile(optimizer=optimizer, loss=loss)

์˜ˆ์ธก์—์„œ ์ •ํ™•๋„๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ๋ชจ๋ธ์„ ๐Ÿค— Hub๋กœ ํ‘ธ์‹œํ•˜๋ ค๋ฉด Keras callbacks๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”. compute_metrics ํ•จ์ˆ˜๋ฅผ KerasMetricCallback์— ์ „๋‹ฌํ•˜๊ณ , PushToHubCallback์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์—…๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:

>>> from transformers.keras_callbacks import KerasMetricCallback, PushToHubCallback

>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_eval_dataset)
>>> push_to_hub_callback = PushToHubCallback(
...     output_dir="food_classifier",
...     tokenizer=image_processor,
...     save_strategy="no",
... )
>>> callbacks = [metric_callback, push_to_hub_callback]

์ด์ œ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•  ์ค€๋น„๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค! ํ›ˆ๋ จ ๋ฐ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ ์„ธํŠธ, ์—ํญ ์ˆ˜์™€ ํ•จ๊ป˜ fit()์„ ํ˜ธ์ถœํ•˜๊ณ , ์ฝœ๋ฐฑ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค:

>>> model.fit(tf_train_dataset, validation_data=tf_eval_dataset, epochs=num_epochs, callbacks=callbacks)
Epoch 1/5
250/250 [==============================] - 313s 1s/step - loss: 2.5623 - val_loss: 1.4161 - accuracy: 0.9290
Epoch 2/5
250/250 [==============================] - 265s 1s/step - loss: 0.9181 - val_loss: 0.6808 - accuracy: 0.9690
Epoch 3/5
250/250 [==============================] - 252s 1s/step - loss: 0.3910 - val_loss: 0.4303 - accuracy: 0.9820
Epoch 4/5
250/250 [==============================] - 251s 1s/step - loss: 0.2028 - val_loss: 0.3191 - accuracy: 0.9900
Epoch 5/5
250/250 [==============================] - 238s 949ms/step - loss: 0.1232 - val_loss: 0.3259 - accuracy: 0.9890

์ถ•ํ•˜ํ•ฉ๋‹ˆ๋‹ค! ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๊ณ  ๐Ÿค— Hub์— ๊ณต์œ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด์ œ ์ถ”๋ก ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ์ž์„ธํ•œ ์˜ˆ์ œ๋Š” ๋‹ค์Œ PyTorch notebook์„ ์ฐธ์กฐํ•˜์„ธ์š”.

์ถ”๋ก [[inference]]

์ข‹์•„์š”, ์ด์ œ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ–ˆ์œผ๋‹ˆ ์ถ”๋ก ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๊ณ ์ž ํ•˜๋Š” ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์ ธ์™€๋ด…์‹œ๋‹ค:

>>> ds = load_dataset("food101", split="validation[:10]")
>>> image = ds["image"][0]
image of beignets

๋ฏธ์„ธ ์กฐ์ • ๋ชจ๋ธ๋กœ ์ถ”๋ก ์„ ์‹œ๋„ํ•˜๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ [pipeline]์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋ชจ๋ธ๋กœ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ pipeline์„ ์ธ์Šคํ„ด์Šคํ™”ํ•˜๊ณ  ์ด๋ฏธ์ง€๋ฅผ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค:

>>> from transformers import pipeline

>>> classifier = pipeline("image-classification", model="my_awesome_food_model")
>>> classifier(image)
[{'score': 0.31856709718704224, 'label': 'beignets'},
 {'score': 0.015232225880026817, 'label': 'bruschetta'},
 {'score': 0.01519392803311348, 'label': 'chicken_wings'},
 {'score': 0.013022331520915031, 'label': 'pork_chop'},
 {'score': 0.012728818692266941, 'label': 'prime_rib'}]

์›ํ•œ๋‹ค๋ฉด, pipeline์˜ ๊ฒฐ๊ณผ๋ฅผ ์ˆ˜๋™์œผ๋กœ ๋ณต์ œํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค:

์ด๋ฏธ์ง€๋ฅผ ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋ฅผ ๊ฐ€์ ธ์˜ค๊ณ  `input`์„ PyTorch ํ…์„œ๋กœ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:
>>> from transformers import AutoImageProcessor
>>> import torch

>>> image_processor = AutoImageProcessor.from_pretrained("my_awesome_food_model")
>>> inputs = image_processor(image, return_tensors="pt")

์ž…๋ ฅ์„ ๋ชจ๋ธ์— ์ „๋‹ฌํ•˜๊ณ  logits์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

>>> from transformers import AutoModelForImageClassification

>>> model = AutoModelForImageClassification.from_pretrained("my_awesome_food_model")
>>> with torch.no_grad():
...     logits = model(**inputs).logits

ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’์€ ์˜ˆ์ธก ๋ ˆ์ด๋ธ”์„ ๊ฐ€์ ธ์˜ค๊ณ , ๋ชจ๋ธ์˜ id2label ๋งคํ•‘์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ ˆ์ด๋ธ”๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

>>> predicted_label = logits.argmax(-1).item()
>>> model.config.id2label[predicted_label]
'beignets'
์ด๋ฏธ์ง€๋ฅผ ์ „์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์ด๋ฏธ์ง€ ํ”„๋กœ์„ธ์„œ๋ฅผ ๊ฐ€์ ธ์˜ค๊ณ  `input`์„ TensorFlow ํ…์„œ๋กœ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:
>>> from transformers import AutoImageProcessor

>>> image_processor = AutoImageProcessor.from_pretrained("MariaK/food_classifier")
>>> inputs = image_processor(image, return_tensors="tf")

์ž…๋ ฅ์„ ๋ชจ๋ธ์— ์ „๋‹ฌํ•˜๊ณ  logits์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

>>> from transformers import TFAutoModelForImageClassification

>>> model = TFAutoModelForImageClassification.from_pretrained("MariaK/food_classifier")
>>> logits = model(**inputs).logits

ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’์€ ์˜ˆ์ธก ๋ ˆ์ด๋ธ”์„ ๊ฐ€์ ธ์˜ค๊ณ , ๋ชจ๋ธ์˜ id2label ๋งคํ•‘์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ ˆ์ด๋ธ”๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค:

>>> predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0])
>>> model.config.id2label[predicted_class_id]
'beignets'