|
--- |
|
title: youtube-channel-surfer-ai |
|
license: mit |
|
emoji: "📺" |
|
app_file: "app.py" |
|
sdk: "gradio" |
|
pinned: false |
|
python_version: 3.13 |
|
--- |
|
|
|
# 📺 YouTube Metadata Q&A Agent |
|
|
|
This application allows you to index YouTube channels and ask natural language questions about the videos. It leverages **OpenAI embeddings** and **GPT-4o-mini** to provide insightful answers based on video metadata (titles + descriptions), and it displays top relevant videos in a clean, interactive table. |
|
|
|
--- |
|
|
|
## Features |
|
|
|
- **Index YouTube Channels**: Provide one or more YouTube channel URLs to index video metadata. |
|
- **Search & Answer Questions**: Ask questions about channel content and get answers generated by an LLM. |
|
- **Top Video Results**: View top relevant videos in a structured HTML table with clickable links. |
|
- **Embedded Video Player**: Watch videos directly in the app using YouTube embeds. |
|
- **Refresh Channels**: Update previously indexed channels to include the latest videos. |
|
- **Lightweight Storage**: Uses a local **ChromaDB** persistent database to store video embeddings for fast retrieval. |
|
- **Structured LLM Output**: LLM returns structured `LLMAnswer` objects with textual answer + top videos for clean rendering. |
|
|
|
--- |
|
|
|
## How it Works |
|
|
|
1. **Channel Indexing**: |
|
- The app fetches the latest videos from provided YouTube channels using the YouTube Data API. |
|
- Video metadata (title, description, channel, video ID) is embedded with OpenAI embeddings and stored in ChromaDB. |
|
|
|
2. **Query & Retrieval**: |
|
- User queries are embedded and compared with stored video embeddings. |
|
- Top matching videos are retrieved. |
|
|
|
3. **Answer Generation**: |
|
- The LLM generates an answer based on the top video metadata. |
|
- The answer and top videos are returned as structured data (`LLMAnswer`). |
|
|
|
4. **Rendering**: |
|
- Answer text is displayed in Markdown. |
|
- Top videos are displayed in a structured HTML table with clickable links and embedded YouTube players. |
|
|
|
--- |
|
|
|
## Installation |
|
|
|
## Steps to Run |
|
|
|
1. **Clone the repository:** |
|
|
|
git clone <repo_url> |
|
cd youtube_surfer_ai_agent |
|
|
|
2. **Create and activate a virtual environment:** |
|
|
|
- Linux/macOS: |
|
|
|
python -m venv .venv |
|
source .venv/bin/activate |
|
|
|
- Windows: |
|
|
|
python -m venv .venv |
|
.venv\Scripts\activate |
|
|
|
3. **Install dependencies:** |
|
|
|
pip install -r requirements.txt |
|
|
|
4. **Create a `.env` file** in the project root with your API keys: |
|
|
|
YOUTUBE_API_KEY=your_youtube_api_key |
|
OPENAI_API_KEY=your_openai_api_key |
|
|
|
5. **Run the application:** |
|
|
|
python app.py |
|
|
|
6. **Open the Gradio interface** in your browser (default: http://127.0.0.1:7860). |
|
|
|
--- |
|
|
|
## How to Use |
|
|
|
- **Index Channels:** Paste one or more YouTube channel URLs (comma or newline separated) and click "Index Channels". |
|
- **Refresh Channels:** Use the sidebar "Refresh All Channels" button to update existing channels. |
|
- **Ask Questions:** Type a query in the text box and click "Get Answer" to receive a structured response with embedded videos. |
|
- **View Indexed Channels:** The sidebar lists all channels that have been indexed with clickable links. |
|
|
|
--- |
|
|
|
## Notes |
|
|
|
- The LLM uses structured outputs (`LLMAnswer` + `VideoItem`) internally to produce consistent results. |
|
- Top videos are embedded as iframes in the Gradio interface. |
|
- You can adjust the number of top videos returned by modifying the `top_k` parameter in `answer_query`. |
|
|
|
--- |
|
|