|
<!--Copyright 2022 The HuggingFace Team. All rights reserved. |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http://www.apache.org/licenses/LICENSE-2.0 |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
|
|
โ ๏ธ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be |
|
rendered properly in your Markdown viewer. |
|
|
|
--> |
|
|
|
# ์ด๋ฏธ์ง ๋ถ๋ฅ[[image-classification]] |
|
|
|
[[open-in-colab]] |
|
|
|
<Youtube id="tjAIM7BOYhw"/> |
|
|
|
์ด๋ฏธ์ง ๋ถ๋ฅ๋ ์ด๋ฏธ์ง์ ๋ ์ด๋ธ ๋๋ ํด๋์ค๋ฅผ ํ ๋นํฉ๋๋ค. ํ
์คํธ ๋๋ ์ค๋์ค ๋ถ๋ฅ์ ๋ฌ๋ฆฌ ์
๋ ฅ์ |
|
์ด๋ฏธ์ง๋ฅผ ๊ตฌ์ฑํ๋ ํฝ์
๊ฐ์
๋๋ค. ์ด๋ฏธ์ง ๋ถ๋ฅ์๋ ์์ฐ์ฌํด ํ ํผํด ๊ฐ์ง, ๋์๋ฌผ ๊ฑด๊ฐ ๋ชจ๋ํฐ๋ง, ์๋ฃ ์ด๋ฏธ์ง์์ ์ง๋ณ์ ์งํ ๊ฒ์ฌ ์ง์ ๋ฑ |
|
๋ค์ํ ์์ฉ ์ฌ๋ก๊ฐ ์์ต๋๋ค. |
|
|
|
์ด ๊ฐ์ด๋์์๋ ๋ค์์ ์ค๋ช
ํฉ๋๋ค: |
|
|
|
1. [Food-101](https://huggingface.co/datasets/food101) ๋ฐ์ดํฐ ์ธํธ์์ [ViT](model_doc/vit)๋ฅผ ๋ฏธ์ธ ์กฐ์ ํ์ฌ ์ด๋ฏธ์ง์์ ์ํ ํญ๋ชฉ์ ๋ถ๋ฅํฉ๋๋ค. |
|
2. ์ถ๋ก ์ ์ํด ๋ฏธ์ธ ์กฐ์ ๋ชจ๋ธ์ ์ฌ์ฉํฉ๋๋ค. |
|
|
|
<Tip> |
|
์ด ํํ ๋ฆฌ์ผ์์ ์ค๋ช
ํ๋ ์์
์ ๋ค์ ๋ชจ๋ธ ์ํคํ
์ฒ์ ์ํด ์ง์๋ฉ๋๋ค: |
|
|
|
<!--This tip is automatically generated by `make fix-copies`, do not fill manually!--> |
|
|
|
[BEiT](../model_doc/beit), [BiT](../model_doc/bit), [ConvNeXT](../model_doc/convnext), [ConvNeXTV2](../model_doc/convnextv2), [CvT](../model_doc/cvt), [Data2VecVision](../model_doc/data2vec-vision), [DeiT](../model_doc/deit), [DiNAT](../model_doc/dinat), [EfficientFormer](../model_doc/efficientformer), [EfficientNet](../model_doc/efficientnet), [FocalNet](../model_doc/focalnet), [ImageGPT](../model_doc/imagegpt), [LeViT](../model_doc/levit), [MobileNetV1](../model_doc/mobilenet_v1), [MobileNetV2](../model_doc/mobilenet_v2), [MobileViT](../model_doc/mobilevit), [NAT](../model_doc/nat), [Perceiver](../model_doc/perceiver), [PoolFormer](../model_doc/poolformer), [RegNet](../model_doc/regnet), [ResNet](../model_doc/resnet), [SegFormer](../model_doc/segformer), [Swin Transformer](../model_doc/swin), [Swin Transformer V2](../model_doc/swinv2), [VAN](../model_doc/van), [ViT](../model_doc/vit), [ViT Hybrid](../model_doc/vit_hybrid), [ViTMSN](../model_doc/vit_msn) |
|
<!--End of the generated tip--> |
|
|
|
</Tip> |
|
|
|
์์ํ๊ธฐ ์ ์, ํ์ํ ๋ชจ๋ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๊ฐ ์ค์น๋์ด ์๋์ง ํ์ธํ์ธ์: |
|
|
|
```bash |
|
pip install transformers datasets evaluate |
|
``` |
|
|
|
Hugging Face ๊ณ์ ์ ๋ก๊ทธ์ธํ์ฌ ๋ชจ๋ธ์ ์
๋ก๋ํ๊ณ ์ปค๋ฎค๋ํฐ์ ๊ณต์ ํ๋ ๊ฒ์ ๊ถ์ฅํฉ๋๋ค. ๋ฉ์์ง๊ฐ ํ์๋๋ฉด, ํ ํฐ์ ์
๋ ฅํ์ฌ ๋ก๊ทธ์ธํ์ธ์: |
|
|
|
```py |
|
>>> from huggingface_hub import notebook_login |
|
|
|
>>> notebook_login() |
|
``` |
|
|
|
## Food-101 ๋ฐ์ดํฐ ์ธํธ ๊ฐ์ ธ์ค๊ธฐ[[load-food101-dataset]] |
|
|
|
๐ค Datasets ๋ผ์ด๋ธ๋ฌ๋ฆฌ์์ Food-101 ๋ฐ์ดํฐ ์ธํธ์ ๋ ์์ ๋ถ๋ถ ์งํฉ์ ๊ฐ์ ธ์ค๋ ๊ฒ์ผ๋ก ์์ํฉ๋๋ค. ์ด๋ ๊ฒ ํ๋ฉด ์ ์ฒด ๋ฐ์ดํฐ ์ธํธ์ ๋ํ |
|
ํ๋ จ์ ๋ง์ ์๊ฐ์ ํ ์ ํ๊ธฐ ์ ์ ์คํ์ ํตํด ๋ชจ๋ ๊ฒ์ด ์ ๋๋ก ์๋ํ๋์ง ํ์ธํ ์ ์์ต๋๋ค. |
|
|
|
```py |
|
>>> from datasets import load_dataset |
|
|
|
>>> food = load_dataset("food101", split="train[:5000]") |
|
``` |
|
|
|
๋ฐ์ดํฐ ์ธํธ์ `train`์ [`~datasets.Dataset.train_test_split`] ๋ฉ์๋๋ฅผ ์ฌ์ฉํ์ฌ ํ๋ จ ๋ฐ ํ
์คํธ ์ธํธ๋ก ๋ถํ ํ์ธ์: |
|
|
|
```py |
|
>>> food = food.train_test_split(test_size=0.2) |
|
``` |
|
|
|
๊ทธ๋ฆฌ๊ณ ์์๋ฅผ ์ดํด๋ณด์ธ์: |
|
|
|
```py |
|
>>> food["train"][0] |
|
{'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=512x512 at 0x7F52AFC8AC50>, |
|
'label': 79} |
|
``` |
|
|
|
๋ฐ์ดํฐ ์ธํธ์ ๊ฐ ์์ ์๋ ๋ ๊ฐ์ ํ๋๊ฐ ์์ต๋๋ค: |
|
|
|
- `image`: ์ํ ํญ๋ชฉ์ PIL ์ด๋ฏธ์ง |
|
- `label`: ์ํ ํญ๋ชฉ์ ๋ ์ด๋ธ ํด๋์ค |
|
|
|
๋ชจ๋ธ์ด ๋ ์ด๋ธ ID์์ ๋ ์ด๋ธ ์ด๋ฆ์ ์ฝ๊ฒ ๊ฐ์ ธ์ฌ ์ ์๋๋ก |
|
๋ ์ด๋ธ ์ด๋ฆ์ ์ ์๋ก ๋งคํํ๊ณ , ์ ์๋ฅผ ๋ ์ด๋ธ ์ด๋ฆ์ผ๋ก ๋งคํํ๋ ์ฌ์ ์ ๋ง๋์ธ์: |
|
|
|
```py |
|
>>> labels = food["train"].features["label"].names |
|
>>> label2id, id2label = dict(), dict() |
|
>>> for i, label in enumerate(labels): |
|
... label2id[label] = str(i) |
|
... id2label[str(i)] = label |
|
``` |
|
|
|
์ด์ ๋ ์ด๋ธ ID๋ฅผ ๋ ์ด๋ธ ์ด๋ฆ์ผ๋ก ๋ณํํ ์ ์์ต๋๋ค: |
|
|
|
```py |
|
>>> id2label[str(79)] |
|
'prime_rib' |
|
``` |
|
|
|
## ์ ์ฒ๋ฆฌ[[preprocess]] |
|
|
|
๋ค์ ๋จ๊ณ๋ ์ด๋ฏธ์ง๋ฅผ ํ
์๋ก ์ฒ๋ฆฌํ๊ธฐ ์ํด ViT ์ด๋ฏธ์ง ํ๋ก์ธ์๋ฅผ ๊ฐ์ ธ์ค๋ ๊ฒ์
๋๋ค: |
|
|
|
```py |
|
>>> from transformers import AutoImageProcessor |
|
|
|
>>> checkpoint = "google/vit-base-patch16-224-in21k" |
|
>>> image_processor = AutoImageProcessor.from_pretrained(checkpoint) |
|
``` |
|
|
|
<frameworkcontent> |
|
<pt> |
|
์ด๋ฏธ์ง์ ๋ช ๊ฐ์ง ์ด๋ฏธ์ง ๋ณํ์ ์ ์ฉํ์ฌ ๊ณผ์ ํฉ์ ๋ํด ๋ชจ๋ธ์ ๋ ๊ฒฌ๊ณ ํ๊ฒ ๋ง๋ญ๋๋ค. ์ฌ๊ธฐ์ Torchvision์ [`transforms`](https://pytorch.org/vision/stable/transforms.html) ๋ชจ๋์ ์ฌ์ฉํ์ง๋ง, ์ํ๋ ์ด๋ฏธ์ง ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ฌ์ฉํ ์๋ ์์ต๋๋ค. |
|
|
|
์ด๋ฏธ์ง์ ์์ ๋ถ๋ถ์ ํฌ๋กญํ๊ณ ํฌ๊ธฐ๋ฅผ ์กฐ์ ํ ๋ค์, ์ด๋ฏธ์ง ํ๊ท ๊ณผ ํ์ค ํธ์ฐจ๋ก ์ ๊ทํํ์ธ์: |
|
|
|
```py |
|
>>> from torchvision.transforms import RandomResizedCrop, Compose, Normalize, ToTensor |
|
|
|
>>> normalize = Normalize(mean=image_processor.image_mean, std=image_processor.image_std) |
|
>>> size = ( |
|
... image_processor.size["shortest_edge"] |
|
... if "shortest_edge" in image_processor.size |
|
... else (image_processor.size["height"], image_processor.size["width"]) |
|
... ) |
|
>>> _transforms = Compose([RandomResizedCrop(size), ToTensor(), normalize]) |
|
``` |
|
|
|
๊ทธ๋ฐ ๋ค์ ์ ์ฒ๋ฆฌ ํจ์๋ฅผ ๋ง๋ค์ด ๋ณํ์ ์ ์ฉํ๊ณ ์ด๋ฏธ์ง์ `pixel_values`(๋ชจ๋ธ์ ๋ํ ์
๋ ฅ)๋ฅผ ๋ฐํํ์ธ์: |
|
|
|
```py |
|
>>> def transforms(examples): |
|
... examples["pixel_values"] = [_transforms(img.convert("RGB")) for img in examples["image"]] |
|
... del examples["image"] |
|
... return examples |
|
``` |
|
|
|
์ ์ฒด ๋ฐ์ดํฐ ์ธํธ์ ์ ์ฒ๋ฆฌ ๊ธฐ๋ฅ์ ์ ์ฉํ๋ ค๋ฉด ๐ค Datasets [`~datasets.Dataset.with_transform`]์ ์ฌ์ฉํฉ๋๋ค. ๋ฐ์ดํฐ ์ธํธ์ ์์๋ฅผ ๊ฐ์ ธ์ฌ ๋ ๋ณํ์ด ์ฆ์ ์ ์ฉ๋ฉ๋๋ค: |
|
|
|
```py |
|
>>> food = food.with_transform(transforms) |
|
``` |
|
|
|
์ด์ [`DefaultDataCollator`]๋ฅผ ์ฌ์ฉํ์ฌ ์์ ๋ฐฐ์น๋ฅผ ๋ง๋ญ๋๋ค. ๐ค Transformers์ ๋ค๋ฅธ ๋ฐ์ดํฐ ์ฝ๋ ์ดํฐ์ ๋ฌ๋ฆฌ, `DefaultDataCollator`๋ ํจ๋ฉ๊ณผ ๊ฐ์ ์ถ๊ฐ์ ์ธ ์ ์ฒ๋ฆฌ๋ฅผ ์ ์ฉํ์ง ์์ต๋๋ค. |
|
|
|
```py |
|
>>> from transformers import DefaultDataCollator |
|
|
|
>>> data_collator = DefaultDataCollator() |
|
``` |
|
</pt> |
|
</frameworkcontent> |
|
|
|
|
|
<frameworkcontent> |
|
<tf> |
|
|
|
๊ณผ์ ํฉ์ ๋ฐฉ์งํ๊ณ ๋ชจ๋ธ์ ๋ณด๋ค ๊ฒฌ๊ณ ํ๊ฒ ๋ง๋ค๊ธฐ ์ํด ๋ฐ์ดํฐ ์ธํธ์ ํ๋ จ ๋ถ๋ถ์ ๋ฐ์ดํฐ ์ฆ๊ฐ์ ์ถ๊ฐํฉ๋๋ค. |
|
์ฌ๊ธฐ์ Keras ์ ์ฒ๋ฆฌ ๋ ์ด์ด๋ก ํ๋ จ ๋ฐ์ดํฐ์ ๋ํ ๋ณํ(๋ฐ์ดํฐ ์ฆ๊ฐ ํฌํจ)๊ณผ |
|
๊ฒ์ฆ ๋ฐ์ดํฐ์ ๋ํ ๋ณํ(์ค์ ํฌ๋กํ, ํฌ๊ธฐ ์กฐ์ , ์ ๊ทํ๋ง)์ ์ ์ํฉ๋๋ค. |
|
`tf.image` ๋๋ ๋ค๋ฅธ ์ํ๋ ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ฌ์ฉํ ์ ์์ต๋๋ค. |
|
|
|
```py |
|
>>> from tensorflow import keras |
|
>>> from tensorflow.keras import layers |
|
|
|
>>> size = (image_processor.size["height"], image_processor.size["width"]) |
|
|
|
>>> train_data_augmentation = keras.Sequential( |
|
... [ |
|
... layers.RandomCrop(size[0], size[1]), |
|
... layers.Rescaling(scale=1.0 / 127.5, offset=-1), |
|
... layers.RandomFlip("horizontal"), |
|
... layers.RandomRotation(factor=0.02), |
|
... layers.RandomZoom(height_factor=0.2, width_factor=0.2), |
|
... ], |
|
... name="train_data_augmentation", |
|
... ) |
|
|
|
>>> val_data_augmentation = keras.Sequential( |
|
... [ |
|
... layers.CenterCrop(size[0], size[1]), |
|
... layers.Rescaling(scale=1.0 / 127.5, offset=-1), |
|
... ], |
|
... name="val_data_augmentation", |
|
... ) |
|
``` |
|
|
|
๋ค์์ผ๋ก ํ ๋ฒ์ ํ๋์ ์ด๋ฏธ์ง๊ฐ ์๋๋ผ ์ด๋ฏธ์ง ๋ฐฐ์น์ ์ ์ ํ ๋ณํ์ ์ ์ฉํ๋ ํจ์๋ฅผ ๋ง๋ญ๋๋ค. |
|
|
|
```py |
|
>>> import numpy as np |
|
>>> import tensorflow as tf |
|
>>> from PIL import Image |
|
|
|
|
|
>>> def convert_to_tf_tensor(image: Image): |
|
... np_image = np.array(image) |
|
... tf_image = tf.convert_to_tensor(np_image) |
|
... # `expand_dims()` is used to add a batch dimension since |
|
... # the TF augmentation layers operates on batched inputs. |
|
... return tf.expand_dims(tf_image, 0) |
|
|
|
|
|
>>> def preprocess_train(example_batch): |
|
... """Apply train_transforms across a batch.""" |
|
... images = [ |
|
... train_data_augmentation(convert_to_tf_tensor(image.convert("RGB"))) for image in example_batch["image"] |
|
... ] |
|
... example_batch["pixel_values"] = [tf.transpose(tf.squeeze(image)) for image in images] |
|
... return example_batch |
|
|
|
|
|
... def preprocess_val(example_batch): |
|
... """Apply val_transforms across a batch.""" |
|
... images = [ |
|
... val_data_augmentation(convert_to_tf_tensor(image.convert("RGB"))) for image in example_batch["image"] |
|
... ] |
|
... example_batch["pixel_values"] = [tf.transpose(tf.squeeze(image)) for image in images] |
|
... return example_batch |
|
``` |
|
|
|
๐ค Datasets [`~datasets.Dataset.set_transform`]๋ฅผ ์ฌ์ฉํ์ฌ ์ฆ์ ๋ณํ์ ์ ์ฉํ์ธ์: |
|
|
|
```py |
|
food["train"].set_transform(preprocess_train) |
|
food["test"].set_transform(preprocess_val) |
|
``` |
|
|
|
์ต์ข
์ ์ฒ๋ฆฌ ๋จ๊ณ๋ก `DefaultDataCollator`๋ฅผ ์ฌ์ฉํ์ฌ ์์ ๋ฐฐ์น๋ฅผ ๋ง๋ญ๋๋ค. ๐ค Transformers์ ๋ค๋ฅธ ๋ฐ์ดํฐ ์ฝ๋ ์ดํฐ์ ๋ฌ๋ฆฌ |
|
`DefaultDataCollator`๋ ํจ๋ฉ๊ณผ ๊ฐ์ ์ถ๊ฐ ์ ์ฒ๋ฆฌ๋ฅผ ์ ์ฉํ์ง ์์ต๋๋ค. |
|
|
|
```py |
|
>>> from transformers import DefaultDataCollator |
|
|
|
>>> data_collator = DefaultDataCollator(return_tensors="tf") |
|
``` |
|
</tf> |
|
</frameworkcontent> |
|
|
|
## ํ๊ฐ[[evaluate]] |
|
|
|
ํ๋ จ ์ค์ ํ๊ฐ ์งํ๋ฅผ ํฌํจํ๋ฉด ๋ชจ๋ธ์ ์ฑ๋ฅ์ ํ๊ฐํ๋ ๋ฐ ๋์์ด ๋๋ ๊ฒฝ์ฐ๊ฐ ๋ง์ต๋๋ค. |
|
๐ค [Evaluate](https://huggingface.co/docs/evaluate/index) ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ก ํ๊ฐ ๋ฐฉ๋ฒ์ ๋น ๋ฅด๊ฒ ๊ฐ์ ธ์ฌ ์ ์์ต๋๋ค. ์ด ์์
์์๋ |
|
[accuracy](https://huggingface.co/spaces/evaluate-metric/accuracy) ํ๊ฐ ์งํ๋ฅผ ๊ฐ์ ธ์ต๋๋ค. (๐ค Evaluate [๋น ๋ฅธ ๋๋ฌ๋ณด๊ธฐ](https://huggingface.co/docs/evaluate/a_quick_tour)๋ฅผ ์ฐธ์กฐํ์ฌ ํ๊ฐ ์งํ๋ฅผ ๊ฐ์ ธ์ค๊ณ ๊ณ์ฐํ๋ ๋ฐฉ๋ฒ์ ๋ํด ์์ธํ ์์๋ณด์ธ์): |
|
|
|
```py |
|
>>> import evaluate |
|
|
|
>>> accuracy = evaluate.load("accuracy") |
|
``` |
|
|
|
๊ทธ๋ฐ ๋ค์ ์์ธก๊ณผ ๋ ์ด๋ธ์ [`~evaluate.EvaluationModule.compute`]์ ์ ๋ฌํ์ฌ ์ ํ๋๋ฅผ ๊ณ์ฐํ๋ ํจ์๋ฅผ ๋ง๋ญ๋๋ค: |
|
|
|
```py |
|
>>> import numpy as np |
|
|
|
|
|
>>> def compute_metrics(eval_pred): |
|
... predictions, labels = eval_pred |
|
... predictions = np.argmax(predictions, axis=1) |
|
... return accuracy.compute(predictions=predictions, references=labels) |
|
``` |
|
|
|
์ด์ `compute_metrics` ํจ์๋ฅผ ์ฌ์ฉํ ์ค๋น๊ฐ ๋์์ผ๋ฉฐ, ํ๋ จ์ ์ค์ ํ๋ฉด ์ด ํจ์๋ก ๋๋์์ฌ ๊ฒ์
๋๋ค. |
|
|
|
## ํ๋ จ[[train]] |
|
|
|
<frameworkcontent> |
|
<pt> |
|
<Tip> |
|
|
|
[`Trainer`]๋ฅผ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํ๋ ๋ฐฉ๋ฒ์ ์ต์ํ์ง ์์ ๊ฒฝ์ฐ, [์ฌ๊ธฐ](../training#train-with-pytorch-trainer)์์ ๊ธฐ๋ณธ ํํ ๋ฆฌ์ผ์ ํ์ธํ์ธ์! |
|
|
|
</Tip> |
|
|
|
์ด์ ๋ชจ๋ธ์ ํ๋ จ์ํฌ ์ค๋น๊ฐ ๋์์ต๋๋ค! [`AutoModelForImageClassification`]๋ก ViT๋ฅผ ๊ฐ์ ธ์ต๋๋ค. ์์๋๋ ๋ ์ด๋ธ ์, ๋ ์ด๋ธ ๋งคํ ๋ฐ ๋ ์ด๋ธ ์๋ฅผ ์ง์ ํ์ธ์: |
|
|
|
```py |
|
>>> from transformers import AutoModelForImageClassification, TrainingArguments, Trainer |
|
|
|
>>> model = AutoModelForImageClassification.from_pretrained( |
|
... checkpoint, |
|
... num_labels=len(labels), |
|
... id2label=id2label, |
|
... label2id=label2id, |
|
... ) |
|
``` |
|
|
|
์ด์ ์ธ ๋จ๊ณ๋ง ๊ฑฐ์น๋ฉด ๋์
๋๋ค: |
|
|
|
1. [`TrainingArguments`]์์ ํ๋ จ ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ์ ์ํ์ธ์. `image` ์ด์ด ์ญ์ ๋๊ธฐ ๋๋ฌธ์ ๋ฏธ์ฌ์ฉ ์ด์ ์ ๊ฑฐํ์ง ์๋ ๊ฒ์ด ์ค์ํฉ๋๋ค. `image` ์ด์ด ์์ผ๋ฉด `pixel_values`์ ์์ฑํ ์ ์์ต๋๋ค. ์ด ๋์์ ๋ฐฉ์งํ๋ ค๋ฉด `remove_unused_columns=False`๋ก ์ค์ ํ์ธ์! ๋ค๋ฅธ ์ ์ผํ ํ์ ๋งค๊ฐ๋ณ์๋ ๋ชจ๋ธ ์ ์ฅ ์์น๋ฅผ ์ง์ ํ๋ `output_dir`์
๋๋ค. `push_to_hub=True`๋ก ์ค์ ํ๋ฉด ์ด ๋ชจ๋ธ์ ํ๋ธ์ ํธ์ํฉ๋๋ค(๋ชจ๋ธ์ ์
๋ก๋ํ๋ ค๋ฉด Hugging Face์ ๋ก๊ทธ์ธํด์ผ ํฉ๋๋ค). ๊ฐ ์ํญ์ด ๋๋ ๋๋ง๋ค, [`Trainer`]๊ฐ ์ ํ๋๋ฅผ ํ๊ฐํ๊ณ ํ๋ จ ์ฒดํฌํฌ์ธํธ๋ฅผ ์ ์ฅํฉ๋๋ค. |
|
2. [`Trainer`]์ ๋ชจ๋ธ, ๋ฐ์ดํฐ ์ธํธ, ํ ํฌ๋์ด์ , ๋ฐ์ดํฐ ์ฝ๋ ์ดํฐ ๋ฐ `compute_metrics` ํจ์์ ํจ๊ป ํ๋ จ ์ธ์๋ฅผ ์ ๋ฌํ์ธ์. |
|
3. [`~Trainer.train`]์ ํธ์ถํ์ฌ ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํ์ธ์. |
|
|
|
```py |
|
>>> training_args = TrainingArguments( |
|
... output_dir="my_awesome_food_model", |
|
... remove_unused_columns=False, |
|
... evaluation_strategy="epoch", |
|
... save_strategy="epoch", |
|
... learning_rate=5e-5, |
|
... per_device_train_batch_size=16, |
|
... gradient_accumulation_steps=4, |
|
... per_device_eval_batch_size=16, |
|
... num_train_epochs=3, |
|
... warmup_ratio=0.1, |
|
... logging_steps=10, |
|
... load_best_model_at_end=True, |
|
... metric_for_best_model="accuracy", |
|
... push_to_hub=True, |
|
... ) |
|
|
|
>>> trainer = Trainer( |
|
... model=model, |
|
... args=training_args, |
|
... data_collator=data_collator, |
|
... train_dataset=food["train"], |
|
... eval_dataset=food["test"], |
|
... tokenizer=image_processor, |
|
... compute_metrics=compute_metrics, |
|
... ) |
|
|
|
>>> trainer.train() |
|
``` |
|
|
|
ํ๋ จ์ด ์๋ฃ๋๋ฉด, ๋ชจ๋ ์ฌ๋์ด ๋ชจ๋ธ์ ์ฌ์ฉํ ์ ์๋๋ก [`~transformers.Trainer.push_to_hub`] ๋ฉ์๋๋ก ๋ชจ๋ธ์ ํ๋ธ์ ๊ณต์ ํ์ธ์: |
|
|
|
```py |
|
>>> trainer.push_to_hub() |
|
``` |
|
</pt> |
|
</frameworkcontent> |
|
|
|
<frameworkcontent> |
|
<tf> |
|
|
|
<Tip> |
|
|
|
Keras๋ฅผ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํ๋ ๋ฐฉ๋ฒ์ ์ต์ํ์ง ์์ ๊ฒฝ์ฐ, ๋จผ์ [๊ธฐ๋ณธ ํํ ๋ฆฌ์ผ](./training#train-a-tensorflow-model-with-keras)์ ํ์ธํ์ธ์! |
|
|
|
</Tip> |
|
|
|
TensorFlow์์ ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํ๋ ค๋ฉด ๋ค์ ๋จ๊ณ๋ฅผ ๋ฐ๋ฅด์ธ์: |
|
1. ํ๋ จ ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ์ ์ํ๊ณ ์ตํฐ๋ง์ด์ ์ ํ์ต๋ฅ ์ค์ผ์ฅด์ ์ค์ ํฉ๋๋ค. |
|
2. ์ฌ์ ํ๋ จ๋ ๋ชจ๋ธ์ ์ธ์คํด์คํํฉ๋๋ค. |
|
3. ๐ค Dataset์ `tf.data.Dataset`์ผ๋ก ๋ณํํฉ๋๋ค. |
|
4. ๋ชจ๋ธ์ ์ปดํ์ผํฉ๋๋ค. |
|
5. ์ฝ๋ฐฑ์ ์ถ๊ฐํ๊ณ ํ๋ จ์ ์ํํ๊ธฐ ์ํด `fit()` ๋ฉ์๋๋ฅผ ์ฌ์ฉํฉ๋๋ค. |
|
6. ์ปค๋ฎค๋ํฐ์ ๊ณต์ ํ๊ธฐ ์ํด ๋ชจ๋ธ์ ๐ค Hub์ ์
๋ก๋ํฉ๋๋ค. |
|
|
|
ํ์ดํผํ๋ผ๋ฏธํฐ, ์ตํฐ๋ง์ด์ ๋ฐ ํ์ต๋ฅ ์ค์ผ์ฅด์ ์ ์ํ๋ ๊ฒ์ผ๋ก ์์ํฉ๋๋ค: |
|
|
|
```py |
|
>>> from transformers import create_optimizer |
|
|
|
>>> batch_size = 16 |
|
>>> num_epochs = 5 |
|
>>> num_train_steps = len(food["train"]) * num_epochs |
|
>>> learning_rate = 3e-5 |
|
>>> weight_decay_rate = 0.01 |
|
|
|
>>> optimizer, lr_schedule = create_optimizer( |
|
... init_lr=learning_rate, |
|
... num_train_steps=num_train_steps, |
|
... weight_decay_rate=weight_decay_rate, |
|
... num_warmup_steps=0, |
|
... ) |
|
``` |
|
|
|
๊ทธ๋ฐ ๋ค์ ๋ ์ด๋ธ ๋งคํ๊ณผ ํจ๊ป [`TFAuto ModelForImageClassification`]์ผ๋ก ViT๋ฅผ ๊ฐ์ ธ์ต๋๋ค: |
|
|
|
```py |
|
>>> from transformers import TFAutoModelForImageClassification |
|
|
|
>>> model = TFAutoModelForImageClassification.from_pretrained( |
|
... checkpoint, |
|
... id2label=id2label, |
|
... label2id=label2id, |
|
... ) |
|
``` |
|
|
|
๋ฐ์ดํฐ ์ธํธ๋ฅผ [`~datasets.Dataset.to_tf_dataset`]์ `data_collator`๋ฅผ ์ฌ์ฉํ์ฌ `tf.data.Dataset` ํ์์ผ๋ก ๋ณํํ์ธ์: |
|
|
|
```py |
|
>>> # converting our train dataset to tf.data.Dataset |
|
>>> tf_train_dataset = food["train"].to_tf_dataset( |
|
... columns="pixel_values", label_cols="label", shuffle=True, batch_size=batch_size, collate_fn=data_collator |
|
... ) |
|
|
|
>>> # converting our test dataset to tf.data.Dataset |
|
>>> tf_eval_dataset = food["test"].to_tf_dataset( |
|
... columns="pixel_values", label_cols="label", shuffle=True, batch_size=batch_size, collate_fn=data_collator |
|
... ) |
|
``` |
|
|
|
`compile()`๋ฅผ ์ฌ์ฉํ์ฌ ํ๋ จ ๋ชจ๋ธ์ ๊ตฌ์ฑํ์ธ์: |
|
|
|
```py |
|
>>> from tensorflow.keras.losses import SparseCategoricalCrossentropy |
|
|
|
>>> loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) |
|
>>> model.compile(optimizer=optimizer, loss=loss) |
|
``` |
|
|
|
์์ธก์์ ์ ํ๋๋ฅผ ๊ณ์ฐํ๊ณ ๋ชจ๋ธ์ ๐ค Hub๋ก ํธ์ํ๋ ค๋ฉด [Keras callbacks](../main_classes/keras_callbacks)๋ฅผ ์ฌ์ฉํ์ธ์. |
|
`compute_metrics` ํจ์๋ฅผ [KerasMetricCallback](../main_classes/keras_callbacks#transformers.KerasMetricCallback)์ ์ ๋ฌํ๊ณ , |
|
[PushToHubCallback](../main_classes/keras_callbacks#transformers.PushToHubCallback)์ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ์
๋ก๋ํฉ๋๋ค: |
|
|
|
```py |
|
>>> from transformers.keras_callbacks import KerasMetricCallback, PushToHubCallback |
|
|
|
>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_eval_dataset) |
|
>>> push_to_hub_callback = PushToHubCallback( |
|
... output_dir="food_classifier", |
|
... tokenizer=image_processor, |
|
... save_strategy="no", |
|
... ) |
|
>>> callbacks = [metric_callback, push_to_hub_callback] |
|
``` |
|
|
|
์ด์ ๋ชจ๋ธ์ ํ๋ จํ ์ค๋น๊ฐ ๋์์ต๋๋ค! ํ๋ จ ๋ฐ ๊ฒ์ฆ ๋ฐ์ดํฐ ์ธํธ, ์ํญ ์์ ํจ๊ป `fit()`์ ํธ์ถํ๊ณ , |
|
์ฝ๋ฐฑ์ ์ฌ์ฉํ์ฌ ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํฉ๋๋ค: |
|
|
|
```py |
|
>>> model.fit(tf_train_dataset, validation_data=tf_eval_dataset, epochs=num_epochs, callbacks=callbacks) |
|
Epoch 1/5 |
|
250/250 [==============================] - 313s 1s/step - loss: 2.5623 - val_loss: 1.4161 - accuracy: 0.9290 |
|
Epoch 2/5 |
|
250/250 [==============================] - 265s 1s/step - loss: 0.9181 - val_loss: 0.6808 - accuracy: 0.9690 |
|
Epoch 3/5 |
|
250/250 [==============================] - 252s 1s/step - loss: 0.3910 - val_loss: 0.4303 - accuracy: 0.9820 |
|
Epoch 4/5 |
|
250/250 [==============================] - 251s 1s/step - loss: 0.2028 - val_loss: 0.3191 - accuracy: 0.9900 |
|
Epoch 5/5 |
|
250/250 [==============================] - 238s 949ms/step - loss: 0.1232 - val_loss: 0.3259 - accuracy: 0.9890 |
|
``` |
|
|
|
์ถํํฉ๋๋ค! ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํ๊ณ ๐ค Hub์ ๊ณต์ ํ์ต๋๋ค. ์ด์ ์ถ๋ก ์ ์ฌ์ฉํ ์ ์์ต๋๋ค! |
|
</tf> |
|
</frameworkcontent> |
|
|
|
|
|
<Tip> |
|
|
|
์ด๋ฏธ์ง ๋ถ๋ฅ๋ฅผ ์ํ ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํ๋ ์์ธํ ์์ ๋ ๋ค์ [PyTorch notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb)์ ์ฐธ์กฐํ์ธ์. |
|
|
|
</Tip> |
|
|
|
## ์ถ๋ก [[inference]] |
|
|
|
์ข์์, ์ด์ ๋ชจ๋ธ์ ๋ฏธ์ธ ์กฐ์ ํ์ผ๋ ์ถ๋ก ์ ์ฌ์ฉํ ์ ์์ต๋๋ค! |
|
|
|
์ถ๋ก ์ ์ํํ๊ณ ์ ํ๋ ์ด๋ฏธ์ง๋ฅผ ๊ฐ์ ธ์๋ด
์๋ค: |
|
|
|
```py |
|
>>> ds = load_dataset("food101", split="validation[:10]") |
|
>>> image = ds["image"][0] |
|
``` |
|
|
|
<div class="flex justify-center"> |
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png" alt="image of beignets"/> |
|
</div> |
|
|
|
๋ฏธ์ธ ์กฐ์ ๋ชจ๋ธ๋ก ์ถ๋ก ์ ์๋ํ๋ ๊ฐ์ฅ ๊ฐ๋จํ ๋ฐฉ๋ฒ์ [`pipeline`]์ ์ฌ์ฉํ๋ ๊ฒ์
๋๋ค. ๋ชจ๋ธ๋ก ์ด๋ฏธ์ง ๋ถ๋ฅ๋ฅผ ์ํ `pipeline`์ ์ธ์คํด์คํํ๊ณ ์ด๋ฏธ์ง๋ฅผ ์ ๋ฌํฉ๋๋ค: |
|
|
|
```py |
|
>>> from transformers import pipeline |
|
|
|
>>> classifier = pipeline("image-classification", model="my_awesome_food_model") |
|
>>> classifier(image) |
|
[{'score': 0.31856709718704224, 'label': 'beignets'}, |
|
{'score': 0.015232225880026817, 'label': 'bruschetta'}, |
|
{'score': 0.01519392803311348, 'label': 'chicken_wings'}, |
|
{'score': 0.013022331520915031, 'label': 'pork_chop'}, |
|
{'score': 0.012728818692266941, 'label': 'prime_rib'}] |
|
``` |
|
|
|
์ํ๋ค๋ฉด, `pipeline`์ ๊ฒฐ๊ณผ๋ฅผ ์๋์ผ๋ก ๋ณต์ ํ ์๋ ์์ต๋๋ค: |
|
|
|
<frameworkcontent> |
|
<pt> |
|
์ด๋ฏธ์ง๋ฅผ ์ ์ฒ๋ฆฌํ๊ธฐ ์ํด ์ด๋ฏธ์ง ํ๋ก์ธ์๋ฅผ ๊ฐ์ ธ์ค๊ณ `input`์ PyTorch ํ
์๋ก ๋ฐํํฉ๋๋ค: |
|
|
|
```py |
|
>>> from transformers import AutoImageProcessor |
|
>>> import torch |
|
|
|
>>> image_processor = AutoImageProcessor.from_pretrained("my_awesome_food_model") |
|
>>> inputs = image_processor(image, return_tensors="pt") |
|
``` |
|
|
|
์
๋ ฅ์ ๋ชจ๋ธ์ ์ ๋ฌํ๊ณ logits์ ๋ฐํํฉ๋๋ค: |
|
|
|
```py |
|
>>> from transformers import AutoModelForImageClassification |
|
|
|
>>> model = AutoModelForImageClassification.from_pretrained("my_awesome_food_model") |
|
>>> with torch.no_grad(): |
|
... logits = model(**inputs).logits |
|
``` |
|
|
|
ํ๋ฅ ์ด ๊ฐ์ฅ ๋์ ์์ธก ๋ ์ด๋ธ์ ๊ฐ์ ธ์ค๊ณ , ๋ชจ๋ธ์ `id2label` ๋งคํ์ ์ฌ์ฉํ์ฌ ๋ ์ด๋ธ๋ก ๋ณํํฉ๋๋ค: |
|
|
|
```py |
|
>>> predicted_label = logits.argmax(-1).item() |
|
>>> model.config.id2label[predicted_label] |
|
'beignets' |
|
``` |
|
</pt> |
|
</frameworkcontent> |
|
|
|
<frameworkcontent> |
|
<tf> |
|
์ด๋ฏธ์ง๋ฅผ ์ ์ฒ๋ฆฌํ๊ธฐ ์ํด ์ด๋ฏธ์ง ํ๋ก์ธ์๋ฅผ ๊ฐ์ ธ์ค๊ณ `input`์ TensorFlow ํ
์๋ก ๋ฐํํฉ๋๋ค: |
|
|
|
```py |
|
>>> from transformers import AutoImageProcessor |
|
|
|
>>> image_processor = AutoImageProcessor.from_pretrained("MariaK/food_classifier") |
|
>>> inputs = image_processor(image, return_tensors="tf") |
|
``` |
|
|
|
์
๋ ฅ์ ๋ชจ๋ธ์ ์ ๋ฌํ๊ณ logits์ ๋ฐํํฉ๋๋ค: |
|
|
|
```py |
|
>>> from transformers import TFAutoModelForImageClassification |
|
|
|
>>> model = TFAutoModelForImageClassification.from_pretrained("MariaK/food_classifier") |
|
>>> logits = model(**inputs).logits |
|
``` |
|
|
|
ํ๋ฅ ์ด ๊ฐ์ฅ ๋์ ์์ธก ๋ ์ด๋ธ์ ๊ฐ์ ธ์ค๊ณ , ๋ชจ๋ธ์ `id2label` ๋งคํ์ ์ฌ์ฉํ์ฌ ๋ ์ด๋ธ๋ก ๋ณํํฉ๋๋ค: |
|
|
|
```py |
|
>>> predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0]) |
|
>>> model.config.id2label[predicted_class_id] |
|
'beignets' |
|
``` |
|
|
|
</tf> |
|
</frameworkcontent> |
|
|