File size: 3,455 Bytes
f315fdc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
---
title: youtube-channel-surfer-ai
license: mit
emoji: "📺"
app_file: "app.py"
sdk: "gradio"
pinned: false
python_version: 3.13
---
# 📺 YouTube Metadata Q&A Agent
This application allows you to index YouTube channels and ask natural language questions about the videos. It leverages **OpenAI embeddings** and **GPT-4o-mini** to provide insightful answers based on video metadata (titles + descriptions), and it displays top relevant videos in a clean, interactive table.
---
## Features
- **Index YouTube Channels**: Provide one or more YouTube channel URLs to index video metadata.
- **Search & Answer Questions**: Ask questions about channel content and get answers generated by an LLM.
- **Top Video Results**: View top relevant videos in a structured HTML table with clickable links.
- **Embedded Video Player**: Watch videos directly in the app using YouTube embeds.
- **Refresh Channels**: Update previously indexed channels to include the latest videos.
- **Lightweight Storage**: Uses a local **ChromaDB** persistent database to store video embeddings for fast retrieval.
- **Structured LLM Output**: LLM returns structured `LLMAnswer` objects with textual answer + top videos for clean rendering.
---
## How it Works
1. **Channel Indexing**:
- The app fetches the latest videos from provided YouTube channels using the YouTube Data API.
- Video metadata (title, description, channel, video ID) is embedded with OpenAI embeddings and stored in ChromaDB.
2. **Query & Retrieval**:
- User queries are embedded and compared with stored video embeddings.
- Top matching videos are retrieved.
3. **Answer Generation**:
- The LLM generates an answer based on the top video metadata.
- The answer and top videos are returned as structured data (`LLMAnswer`).
4. **Rendering**:
- Answer text is displayed in Markdown.
- Top videos are displayed in a structured HTML table with clickable links and embedded YouTube players.
---
## Installation
## Steps to Run
1. **Clone the repository:**
git clone <repo_url>
cd youtube_surfer_ai_agent
2. **Create and activate a virtual environment:**
- Linux/macOS:
python -m venv .venv
source .venv/bin/activate
- Windows:
python -m venv .venv
.venv\Scripts\activate
3. **Install dependencies:**
pip install -r requirements.txt
4. **Create a `.env` file** in the project root with your API keys:
YOUTUBE_API_KEY=your_youtube_api_key
OPENAI_API_KEY=your_openai_api_key
5. **Run the application:**
python app.py
6. **Open the Gradio interface** in your browser (default: http://127.0.0.1:7860).
---
## How to Use
- **Index Channels:** Paste one or more YouTube channel URLs (comma or newline separated) and click "Index Channels".
- **Refresh Channels:** Use the sidebar "Refresh All Channels" button to update existing channels.
- **Ask Questions:** Type a query in the text box and click "Get Answer" to receive a structured response with embedded videos.
- **View Indexed Channels:** The sidebar lists all channels that have been indexed with clickable links.
---
## Notes
- The LLM uses structured outputs (`LLMAnswer` + `VideoItem`) internally to produce consistent results.
- Top videos are embedded as iframes in the Gradio interface.
- You can adjust the number of top videos returned by modifying the `top_k` parameter in `answer_query`.
---
|