otpensource-vision

모델 설명

otpensource-vision은 Bllossom/llama-3.2-Korean-Bllossom-AICA-5B를 기반으로 학습된 Vision-Language 모델입니다. 해당 모델은 한국어와 영어로 작성된 텍스트와 이미지를 결합하여 다양한 태스크를 수행할 수 있도록 설계되었습니다.

주요 특징

Bllossom 기반 학습: llama-3.2-Korean-Bllossom-AICA-5B를 기반으로 학습된 모델로, 언어 모델과 시각-언어 모델의 장점을 모두 제공합니다.
Vision-Language 태스크 지원: 이미지를 입력받아 텍스트 정보를 생성하거나, 텍스트 입력만으로 자연어 처리 태스크를 수행할 수 있습니다.
패션 데이터를 활용한 학습: 한국어 패션 데이터셋(otpensource_data)을 활용하여 옷의 카테고리, 색상, 계절, 특징 등 관련 정보를 추출하도록 학습되었습니다.
상업적 활용 가능: 라이선스는 CC-BY-4.0으로 상업적 이용이 가능합니다.

모델 세부사항

학습 데이터

모델 학습에 사용된 데이터셋:

otpensource_dataset:
- 약 9000개의 패션 데이터로 구성
- 옷의 카테고리, 색상, 계절, 특징, 이미지 URL 등을 포함하며, Vision-Language 학습에 최적화

학습 방식

기반 모델: Bllossom/llama-3.2-Korean-Bllossom-AICA-5B
GPU 요구사항: A100 40GB 이상 권장
최적화: Vision-Language 태스크와 한국어 텍스트 태스크를 통합적으로 학습

주요 사용 사례

Vision-Language 태스크

이미지 분석

입력된 이미지에서 옷의 카테고리, 색상, 계절, 특징을 추출하여 JSON 형식으로 반환.

예시:

{
  "category": "트렌치코트",
  "gender": "여",
  "season": "SS",
  "color": "네이비",
  "material": "",
  "feature": "트렌치코트"
}

언어모델 태스크
- 텍스트만 입력했을 때 자연어 처리를 수행하며, 질문 응답, 텍스트 요약, 감정 분석 등 다양한 태스크 수행 가능.

학습 및 성능

LogicKor 벤치마크 성능 (Bllossom 기반 모델 성능)

Category	Single Turn	Multi Turn
Reasoning	6.57	5.29
Math	6.43	6.29
Writing	9.14	8.71
Coding	8.00	9.14
Understanding	8.14	9.29
Grammar	6.71	4.86

학습 구성

모델 크기: 5B 파라미터
학습 데이터 크기: 약 9000개의 시각-언어 데이터
평가 결과: 패션 관련 태스크에서 높은 정확도와 효율성 제공

코드 예시

Vision-Language 태스크

from transformers import MllamaForConditionalGeneration, MllamaProcessor
import torch
from PIL import Image
import requests

model = MllamaForConditionalGeneration.from_pretrained(
  'otpensource-vision',
  torch_dtype=torch.bfloat16,
  device_map='auto'
)
processor = MllamaProcessor.from_pretrained('otpensource-vision')

url = "https://image.msscdn.net/thumbnails/images/prd_img/20240710/4242307/detail_4242307_17205916382801_big.jpg?w=1200"
image = Image.open(requests.get(url, stream=True).raw)

messages = [
  {'role': 'user', 'content': [
    {'type': 'image', 'image': image},
    {'type': 'text', 'text': '이 옷의 정보를 JSON으로 알려줘.'}
  ]}
]

input_text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(
    image=image,
    text=input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to(model.device)

output = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
print(processor.decode(output[0]))

Uploaded finetuned model

Developed by: hateslopacademy
License: apache-2.0
Finetuned from model : Bllossom/llama-3.2-Korean-Bllossom-AICA-5B

This mllama model was trained 2x faster with Unsloth and Huggingface's TRL library.

hateslopacademy
/

otpensource-vision

otpensource-vision

모델 설명

주요 특징

모델 세부사항

학습 데이터

학습 방식

주요 사용 사례

Vision-Language 태스크

학습 및 성능

LogicKor 벤치마크 성능 (Bllossom 기반 모델 성능)

학습 구성

코드 예시

Vision-Language 태스크

Uploaded finetuned model

Model tree for hateslopacademy/otpensource-vision