Spaces:
Runtime error
Runtime error
metadata
title: Image To Audio
emoji: π’
colorFrom: gray
colorTo: yellow
sdk: streamlit
sdk_version: 1.29.0
app_file: app.py
pinned: false
The Image Reader π’
The Image Reader π’ - Playground
This application analyzes the uploaded image, generates an imaginative phrase, and then converts it into audio.
- For image_to_audio following technologies were used:
- Image Reader:
- HuggingFace
image-to-text
task used withSalesforce/blip-image-captioning-base
pretrained model. Which produces a small description about the image. - Salesforce/blip-image-captioning-base
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- HuggingFace
- Generate an imaginative phrase:
- OpenAI
GPT-3.5-Turbo
used to produce an imaginative narrative from the description generated earlier. - The phrase generated with more than 40 words.
- GPT-3.5 Turbo
- OpenAI
- text-to-audio:
suno/bark-small
used to generate the audio version of the imaginative narrative earlier.- suno/bark-small
- BARK: Bark is a transformer-based text-to-audio model created by Suno. Bark can generate highly realistic, multilingual speech as well as other audio - including music, background noise and simple sound effects. The model can also produce nonverbal communications like laughing, sighing and crying.
- Image Reader: