---
license: gemma
language:
- en
pipeline_tag: image-text-to-text
---
# Cerule - A Tiny Mighty Vision Model
### Based on Google's - Gemma-2b + SigLIP
```
██████╗███████╗██████╗ ██╗ ██╗██╗ ███████╗
██╔════╝██╔════╝██╔══██╗██║ ██║██║ ██╔════╝
██║ █████╗ ██████╔╝██║ ██║██║ █████╗
██║ ██╔══╝ ██╔══██╗██║ ██║██║ ██╔══╝
╚██████╗███████╗██║ ██║╚██████╔╝███████╗███████╗
╚═════╝╚══════╝╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚══════╝
```
We train and release "Cerule", a tiny yet powerful Vision Lanuage Model based on the newly released Google's [Gemma-2b](https://huggingface.co/google/gemma-2b) and Google's [SigLIP](https://huggingface.co/google/siglip-so400m-patch14-384).
We utilise highly efficient data selection techniques with:
```
- Pretraining stage : 650K images (A LAION Subset)
- Finetuning stage : 695K images (SVIT-mix-665K modified for finetuning(Dataset SOON!))
```
The training setup was `4xA100's 80GB` and took ~6 hours to pretrain and ~13 hours to finetune. We modify and adapt the training code from [LLaVA](https://github.com/haotian-liu/LLaVA).
🚨 Training code, Data and more details to release soon!
---
| Image | Example |
|-------|---------|
| data:image/s3,"s3://crabby-images/28847/28847991262bea514fb6ba25069c52e79ced4fbf" alt="astronaut" | **Describe the image**
The image is a playful and surreal depiction of a man in a space suit, sitting on a chair and holding a green beer bottle. The man is wearing a white space suit, complete with a helmet and gloves. His feet are clad in black and white shoes, and he is placed on a sandy surface. The background features a large, blue planet, with a moon and a star visible in the sky. |
| data:image/s3,"s3://crabby-images/1bf0e/1bf0e8f0fb11bba840a22a5dc2644d11b0a10c58" alt="mario" | **Who are the characters in the image?**
The image features three characters, two of them are Mario and Luigi, and the third one is Yoshi.
**Describe the actions of the characters**
The Mario and Luigi characters are holding their arms out, as if they are waving. Yoshi is standing on its own, with its arms folded. |
| data:image/s3,"s3://crabby-images/3bb8f/3bb8f363980ce607ae9498f1aef6ef128ab524f6" alt="extreme_ironing" | **What's funny about this image?**
The image is quite humorous as it depicts a man ironing clothes on the back of a yellow taxi cab. This is not a typical sight you'd expect to see in everyday life. |
---
## Training:
We will release the training code in some time.
### Inference:
Clone the following repo and following instructions for a CLI based inference.
https://github.com/Tensoic-AI/Cerule
## License
Model subject to Gemma(base model license) terms of use along with the underlying datasets(LAOIN and SVIT) subject to their respective licenses. All codes are Apache 2.0