File size: 3,455 Bytes
f315fdc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
title: youtube-channel-surfer-ai
license: mit
emoji: "📺"
app_file: "app.py"
sdk: "gradio"
pinned: false
python_version: 3.13
---

# 📺 YouTube Metadata Q&A Agent

This application allows you to index YouTube channels and ask natural language questions about the videos. It leverages **OpenAI embeddings** and **GPT-4o-mini** to provide insightful answers based on video metadata (titles + descriptions), and it displays top relevant videos in a clean, interactive table.

---

## Features

- **Index YouTube Channels**: Provide one or more YouTube channel URLs to index video metadata.
- **Search & Answer Questions**: Ask questions about channel content and get answers generated by an LLM.
- **Top Video Results**: View top relevant videos in a structured HTML table with clickable links.
- **Embedded Video Player**: Watch videos directly in the app using YouTube embeds.
- **Refresh Channels**: Update previously indexed channels to include the latest videos.
- **Lightweight Storage**: Uses a local **ChromaDB** persistent database to store video embeddings for fast retrieval.
- **Structured LLM Output**: LLM returns structured `LLMAnswer` objects with textual answer + top videos for clean rendering.

---

## How it Works

1. **Channel Indexing**:
   - The app fetches the latest videos from provided YouTube channels using the YouTube Data API.
   - Video metadata (title, description, channel, video ID) is embedded with OpenAI embeddings and stored in ChromaDB.

2. **Query & Retrieval**:
   - User queries are embedded and compared with stored video embeddings.
   - Top matching videos are retrieved.

3. **Answer Generation**:
   - The LLM generates an answer based on the top video metadata.
   - The answer and top videos are returned as structured data (`LLMAnswer`).

4. **Rendering**:
   - Answer text is displayed in Markdown.
   - Top videos are displayed in a structured HTML table with clickable links and embedded YouTube players.

---

## Installation

## Steps to Run

1. **Clone the repository:**

        git clone <repo_url>
        cd youtube_surfer_ai_agent

2. **Create and activate a virtual environment:**

    - Linux/macOS:

            python -m venv .venv
            source .venv/bin/activate

    - Windows:

            python -m venv .venv
            .venv\Scripts\activate

3. **Install dependencies:**

        pip install -r requirements.txt

4. **Create a `.env` file** in the project root with your API keys:

        YOUTUBE_API_KEY=your_youtube_api_key
        OPENAI_API_KEY=your_openai_api_key

5. **Run the application:**

        python app.py

6. **Open the Gradio interface** in your browser (default: http://127.0.0.1:7860).

---

## How to Use

- **Index Channels:** Paste one or more YouTube channel URLs (comma or newline separated) and click "Index Channels".
- **Refresh Channels:** Use the sidebar "Refresh All Channels" button to update existing channels.
- **Ask Questions:** Type a query in the text box and click "Get Answer" to receive a structured response with embedded videos.
- **View Indexed Channels:** The sidebar lists all channels that have been indexed with clickable links.

---

## Notes

- The LLM uses structured outputs (`LLMAnswer` + `VideoItem`) internally to produce consistent results.
- Top videos are embedded as iframes in the Gradio interface.
- You can adjust the number of top videos returned by modifying the `top_k` parameter in `answer_query`.

---