--- library_name: transformers license: apache-2.0 tags: - vision - image-captioning - blip - multimodal - fashion datasets: - Marqo/fashion200k base_model: - Salesforce/blip-image-captioning-large --- # Fine-Tuned BLIP Model for Fashion Image Captioning This is a fine-tuned BLIP (Bootstrapped Language-Image Pretraining) model specifically designed for **fashion image captioning**. It was fine-tuned on the **Marqo Fashion Dataset** to generate descriptive and contextually relevant captions for fashion-related images. ## Model Details - **Model Type:** BLIP (Vision-Language Pretraining) - **Architecture:** BLIP uses a multimodal transformer architecture to jointly model visual and textual information. - **Fine-Tuning Dataset:** [Marqo Fashion Dataset](https://github.com/marqo-ai/marqo) (a dataset containing fashion images and corresponding captions) - **Task:** Fashion Image Captioning - **License:** Apache 2.0 ## Usage You can use this model with the Hugging Face `transformers` library for fashion image captioning tasks. ### Installation First, install the required libraries: ```bash pip install transformers torch