Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
anakin87Β 
posted an update 26 days ago
Post
1063
Haystack can now see πŸ‘€

The latest release of the Haystack OSS LLM framework adds a long-requested feature: image support!

πŸ““ Notebooks below

This isn't just about passing images to an LLM. We built several features to enable practical multimodal use cases.

What's new?
🧠 Support for multiple LLM providers: OpenAI, Amazon Bedrock, Google Gemini, Mistral, NVIDIA, OpenRouter, Ollama and more (support for Hugging Face API coming πŸ”œ)
πŸŽ›οΈ Prompt template language to handle structured inputs, including images
πŸ“„ PDF and image converters
πŸ” Image embedders using CLIP-like models
🧾 LLM-based extractor to pull text from images
🧩 Components to build multimodal RAG pipelines and Agents


I had the chance of leading this effort with @sjrhuschlee (great collab).

πŸ““ Below you can find two notebooks to explore the new features:
󠁯‒󠁏󠁏 Introduction to Multimodal Text Generation https://haystack.deepset.ai/cookbook/multimodal_intro
󠁯‒󠁏󠁏 Creating Vision+Text RAG Pipelines https://haystack.deepset.ai/tutorials/46_multimodal_rag

(πŸ–ΌοΈ image by @bilgeyucel )
In this post