Roger Taylor 's picture

18 17

Roger Taylor

rogermt

·

AI & ML interests

None yet

Recent Activity

updated a Space about 1 month ago

rogermt/OpenWebUi

liked a Space about 1 month ago

llamaindex/multimodal_vdr_demo

liked a Space about 2 months ago

ginipick/time-machine

View all activity

Organizations

None yet

rogermt's activity

updated a Space about 1 month ago

OpenWebUi

self-hosted AI platform: Openwebui

liked a Space about 1 month ago

Multimodal VDR Demo

Multimodal retrieval using llamaindex/vdr-2b-multi-v1

liked a Space about 2 months ago

Image Time-machine

liked a Space 3 months ago

Anychat

liked a Space 4 months ago

UniPortrait

Generate images using text and ID photos

reacted to merve's post with 👍 5 months ago

Post

5604

I have put together a notebook on Multimodal RAG, where we do not process the documents with hefty pipelines but natively use:
- vidore/colpali for retrieval 📖 it doesn't need indexing with image-text pairs but just images!
- Qwen/Qwen2-VL-2B-Instruct for generation 💬 directly feed images as is to a vision language model with no processing to text!
I used ColPali implementation of the new 🐭 Byaldi library by @bclavie 🤗
https://github.com/answerdotai/byaldi
Link to notebook: https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb

liked a Space 6 months ago

FLUX.1 [dev]

Generate images from text prompts

liked 2 Spaces 7 months ago

Live Portrait

Apply the motion of a video on a portrait

Stable Diffusion Protogen x3.4 Web UI

Generate images from text prompts

liked 2 Spaces 9 months ago

PDF to Markdown

Extract text and metadata from PDF files

H94 IP Adapter FaceID SDXL

Generate photorealistic images matching your face