Spaces:

vikramvasudevan
/

youtube-channel-surfer-ai

Sleeping

@@ -1,12 +1,102 @@
----
-title: Youtube Channel Surfer Ai
-emoji: 📊
-colorFrom: yellow
-colorTo: yellow
-sdk: gradio
-sdk_version: 5.44.0
-app_file: app.py
-pinned: false
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: youtube-channel-surfer-ai
+license: mit
+emoji: "📺"
+app_file: "app.py"
+sdk: "gradio"
+pinned: false
+python_version: 3.16
+---
+# 📺 YouTube Metadata Q&A Agent
+This application allows you to index YouTube channels and ask natural language questions about the videos. It leverages **OpenAI embeddings** and **GPT-4o-mini** to provide insightful answers based on video metadata (titles + descriptions), and it displays top relevant videos in a clean, interactive table.
+---
+## Features
+- **Index YouTube Channels**: Provide one or more YouTube channel URLs to index video metadata.
+- **Search & Answer Questions**: Ask questions about channel content and get answers generated by an LLM.
+- **Top Video Results**: View top relevant videos in a structured HTML table with clickable links.
+- **Embedded Video Player**: Watch videos directly in the app using YouTube embeds.
+- **Refresh Channels**: Update previously indexed channels to include the latest videos.
+- **Lightweight Storage**: Uses a local **ChromaDB** persistent database to store video embeddings for fast retrieval.
+- **Structured LLM Output**: LLM returns structured `LLMAnswer` objects with textual answer + top videos for clean rendering.
+---
+## How it Works
+1. **Channel Indexing**:
+   - The app fetches the latest videos from provided YouTube channels using the YouTube Data API.
+   - Video metadata (title, description, channel, video ID) is embedded with OpenAI embeddings and stored in ChromaDB.
+2. **Query & Retrieval**:
+   - User queries are embedded and compared with stored video embeddings.
+   - Top matching videos are retrieved.
+3. **Answer Generation**:
+   - The LLM generates an answer based on the top video metadata.
+   - The answer and top videos are returned as structured data (`LLMAnswer`).
+4. **Rendering**:
+   - Answer text is displayed in Markdown.
+   - Top videos are displayed in a structured HTML table with clickable links and embedded YouTube players.
+---
+## Installation
+## Steps to Run
+1. **Clone the repository:**
+        git clone <repo_url>
+        cd youtube_surfer_ai_agent
+2. **Create and activate a virtual environment:**
+    - Linux/macOS:
+            python -m venv .venv
+            source .venv/bin/activate
+    - Windows:
+            python -m venv .venv
+            .venv\Scripts\activate
+3. **Install dependencies:**
+        pip install -r requirements.txt
+4. **Create a `.env` file** in the project root with your API keys:
+        YOUTUBE_API_KEY=your_youtube_api_key
+        OPENAI_API_KEY=your_openai_api_key
+5. **Run the application:**
+        python app.py
+6. **Open the Gradio interface** in your browser (default: http://127.0.0.1:7860).
+---
+## How to Use
+- **Index Channels:** Paste one or more YouTube channel URLs (comma or newline separated) and click "Index Channels".
+- **Refresh Channels:** Use the sidebar "Refresh All Channels" button to update existing channels.
+- **Ask Questions:** Type a query in the text box and click "Get Answer" to receive a structured response with embedded videos.
+- **View Indexed Channels:** The sidebar lists all channels that have been indexed with clickable links.
+---
+## Notes
+- The LLM uses structured outputs (`LLMAnswer` + `VideoItem`) internally to produce consistent results.
+- Top videos are embedded as iframes in the Gradio interface.
+- You can adjust the number of top videos returned by modifying the `top_k` parameter in `answer_query`.
+---

app.py ADDED Viewed

	@@ -0,0 +1,149 @@

+import os
+import re
+import gradio as gr
+import chromadb
+from modules.collector import fetch_channel_videos_from_url
+from modules.db import get_indexed_channels
+from modules.indexer import index_videos
+from modules.answerer import answer_query, LLMAnswer, VideoItem, build_video_html
+from dotenv import load_dotenv
+load_dotenv()
+# -------------------------------
+# Setup Chroma
+# -------------------------------
+client = chromadb.PersistentClient(path="./youtube_db")
+collection = client.get_or_create_collection("yt_metadata", embedding_function=None)
+# -------------------------------
+# Utils
+# -------------------------------
+def refresh_channel(api_key, channel_url: str):
+    """Fetch + re-index a single channel."""
+    videos = fetch_channel_videos_from_url(api_key, channel_url)
+    for v in videos:
+        v["channel_url"] = channel_url
+    index_videos(videos, collection, channel_url=channel_url)
+    return len(videos)
+def index_channels(channel_urls: str):
+    yt_api_key = os.environ["YOUTUBE_API_KEY"]
+    urls = [u.strip() for u in re.split(r"[\n,]+", channel_urls) if u.strip()]
+    total_videos = sum(refresh_channel(yt_api_key, url) for url in urls)
+    return (
+        f"✅ Indexed {total_videos} videos from {len(urls)} channels.",
+        list_channels(),
+    )
+def list_channels():
+    channels = get_indexed_channels(collection)
+    if not channels:
+        return "No channels indexed yet."
+    md = []
+    for key, val in channels.items():
+        if isinstance(val, dict):
+            cname = val.get("channel_title", "Unknown")
+            curl = val.get("channel_url", None)
+        else:
+            cname = val
+            curl = key
+        if curl:
+            md.append(f"- **{cname}** ([link]({curl}))")
+        else:
+            md.append(f"- **{cname}**")
+    return "\n".join(md)
+def refresh_all_channels():
+    yt_api_key = os.environ["YOUTUBE_API_KEY"]
+    channels = get_indexed_channels(collection)
+    if not channels:
+        return "⚠️ No channels available to refresh.", list_channels()
+    total_videos = 0
+    for key, val in channels.items():
+        url = val.get("channel_url") if isinstance(val, dict) else key
+        if url:
+            total_videos += refresh_channel(yt_api_key, url)
+    return (
+        f"🔄 Refreshed {len(channels)} channels, re-indexed {total_videos} videos.",
+        list_channels(),
+    )
+def handle_query(query: str):
+    (answer_text, video_html) = answer_query(query, collection)  # returns LLMAnswer
+    return answer_text, video_html
+# -------------------------------
+# Gradio UI
+# -------------------------------
+def show_component():
+    return gr.update(visible=True)
+def hide_component():
+    return gr.update(visible=False)
+def close_component():
+    return gr.update(open=False)
+def open_component():
+    return gr.update(open=True)
+with gr.Blocks() as demo:
+    gr.Markdown("## 📺 YouTube Metadata Q&A Agent")
+    from gradio_modal import Modal
+    with Modal(visible=False) as add_channel_modal:
+        channel_input = gr.Textbox(
+            label="Channel URLs",
+            placeholder="Paste one or more YouTube channel URLs (comma or newline separated)",
+        )
+        save_add_channels_btn = gr.Button("Add Channels")
+        index_status = gr.Markdown(label="Index Status", container=False)
+    with gr.Row():
+        with gr.Sidebar() as my_sidebar:
+            gr.Markdown("### 📺 Channels")
+            channel_list = gr.Markdown(list_channels())
+            with gr.Row():
+                refresh_all_btn = gr.Button(
+                    "🔄 Refresh", size="sm", scale=0
+                )
+                add_channels_btn = gr.Button("+ Add", size="sm", scale=0)
+            refresh_status = gr.Markdown(label="Refresh Status", container=False)
+            refresh_all_btn.click(
+                fn=refresh_all_channels,
+                inputs=None,
+                outputs=[refresh_status, channel_list],
+            )
+            add_channels_btn.click(close_component, outputs=[my_sidebar]).then(show_component, outputs=[add_channel_modal])
+            save_add_channels_btn.click(
+                index_channels,
+                inputs=[channel_input],
+                outputs=[index_status, channel_list],
+            ).then(hide_component, outputs=[add_channel_modal]).then(open_component, outputs=[my_sidebar])
+        with gr.Column(scale=3):
+            question = gr.Textbox(
+                label="Ask a Question",
+                placeholder="e.g., What topics did they cover on AI ethics?",
+            )
+            gr.Examples(
+                [
+                    "Show me some videos that mention Ranganatha.",
+                    "Slokas that mention gajendra moksham",
+                ],
+                inputs=question,
+            )
+            answer = gr.Markdown()
+            video_embed = gr.HTML()  # iframe embeds will render here
+            ask_btn = gr.Button("Get Answer")
+            ask_btn.click(handle_query, inputs=question, outputs=[answer, video_embed])
+if __name__ == "__main__":
+    demo.launch()

main.py ADDED Viewed

	@@ -0,0 +1,38 @@

+# modules/
+# ├── collector.py
+# ├── indexer.py
+# ├── retriever.py
+# ├── answerer.py
+# └── main.py
+import os
+import chromadb
+from dotenv import load_dotenv
+from modules.answerer import answer_query
+from modules.collector import fetch_channel_videos
+from modules.db import get_collection
+from modules.indexer import index_videos
+# -------------------------------
+# 5. Main
+# -------------------------------
+def main():
+    load_dotenv()
+    YT_API_KEY = os.getenv("YOUTUBE_API_KEY")
+    CHANNELS = ["UCqa48rNanVRKmG4qxl-YmEQ"]  # Youtube channel IDs
+    collection = get_collection()
+    # Collect + Index
+    for ch in CHANNELS:
+        videos = fetch_channel_videos(YT_API_KEY, ch)
+        index_videos(videos, collection)
+    # Ask a question
+    query = "Show me some videos that mention about ranganatha."
+    print(answer_query(query, collection))
+if __name__ == "__main__":
+    main()

modules/answerer.py ADDED Viewed

	@@ -0,0 +1,109 @@

+# -------------------------------
+# 4. Answerer
+# -------------------------------
+from typing import List
+from pydantic import BaseModel
+from openai import OpenAI
+from modules.retriever import retrieve_videos
+# -------------------------------
+# Structured Output Classes
+# -------------------------------
+class VideoItem(BaseModel):
+    video_id: str
+    title: str
+    channel: str
+    description: str
+class LLMAnswer(BaseModel):
+    answer_text: str
+    top_videos: List[VideoItem]
+# -------------------------------
+# Main Function
+# -------------------------------
+def answer_query(query: str, collection, top_k: int = 5) -> LLMAnswer:
+    """
+    Answer a user query using YouTube video metadata.
+    Returns an LLMAnswer object with textual answer + list of videos.
+    """
+    results = retrieve_videos(query, collection, top_k=top_k)
+    if not results:
+        return LLMAnswer(answer_text="No relevant videos found.", top_videos=[])
+    # Build context lines for the LLM
+    context_lines = []
+    top_videos_list = []
+    for r in results:
+        # Ensure each result is a dict
+        if not isinstance(r, dict):
+            continue
+        vid_id = r.get("video_id", "")
+        title = r.get("video_title") or r.get("title", "")
+        channel = r.get("channel") or r.get("channel_title", "")
+        description = r.get("description", "")
+        context_lines.append(f"- {title} ({channel}) (https://youtube.com/watch?v={vid_id})\n  description: {description}")
+        top_videos_list.append(
+            VideoItem(
+                video_id=vid_id,
+                title=title,
+                channel=channel,
+                description=description
+            )
+        )
+    context_text = "\n".join(context_lines)
+    # Call LLM with structured output
+    client = OpenAI()
+    response = client.chat.completions.parse(
+        model="gpt-4o-mini",
+        messages=[
+            {
+                "role": "system",
+                "content": (
+                    "You are a helpful assistant that answers questions using YouTube video metadata. "
+                    "Return your response strictly as the LLMAnswer class, including 'answer_text' and a list of 'top_videos'."
+                )
+            },
+            {
+                "role": "user",
+                "content": f"Question: {query}\n\nRelevant videos:\n{context_text}\n\nAnswer based only on this."
+            }
+        ],
+        response_format=LLMAnswer
+    )
+    llm_answer = response.choices[0].message.parsed  # already LLMAnswer object
+    answer_text = llm_answer.answer_text
+    video_html = build_video_html(llm_answer.top_videos)
+    return answer_text, video_html
+def build_video_html(videos: list[VideoItem]) -> str:
+    """Build a clean HTML table from top_videos."""
+    if not videos:
+        return "<p>No relevant videos found.</p>"
+    html = """
+    <table border="1" style="border-collapse: collapse; width: 100%;">
+        <tr>
+            <th>Title</th>
+            <th>Channel</th>
+            <th>Description</th>
+            <th>Watch</th>
+        </tr>
+    """
+    for v in videos:
+        html += f"""
+        <tr>
+            <td>{v.title}</td>
+            <td>{v.channel}</td>
+            <td>{v.description}</td>
+            <td><a href="https://youtube.com/watch?v={v.video_id}" target="_blank">▶️ Watch</a></td>
+        </tr>
+        """
+    html += "</table>"
+    return html

modules/collector.py ADDED Viewed

	@@ -0,0 +1,69 @@

+# -------------------------------
+# 1. Collector
+# -------------------------------
+from typing import List,Dict
+from googleapiclient.discovery import build
+from modules.youtube_utils import get_channel_id
+def fetch_channel_videos_from_url(api_key: str, channel_url: str, max_results=20):
+    youtube = build("youtube", "v3", developerKey=api_key)
+    channel_id = get_channel_id(youtube, channel_url)
+    # Get channel details to fetch its title
+    channel_response = youtube.channels().list(
+        part="snippet",
+        id=channel_id
+    ).execute()
+    channel_title = channel_response["items"][0]["snippet"]["title"]
+    request = youtube.search().list(
+        part="snippet",
+        channelId=channel_id,
+        maxResults=max_results,
+        order="date"
+    )
+    response = request.execute()
+    videos = []
+    for item in response.get("items", []):
+        if item["id"]["kind"] == "youtube#video":
+            videos.append({
+                "video_id": item["id"]["videoId"],
+                "title": item["snippet"]["title"],
+                "description": item["snippet"].get("description", ""),
+                "channel_id": channel_id,
+                "channel_title": channel_title,
+            })
+    return videos
+def fetch_channel_videos(api_key: str, channel_id: str, max_results=20):
+    youtube = build("youtube", "v3", developerKey=api_key)
+    # Fetch channel title
+    channel_response = youtube.channels().list(
+        part="snippet",
+        id=channel_id
+    ).execute()
+    channel_title = channel_response["items"][0]["snippet"]["title"]
+    request = youtube.search().list(
+        part="snippet",
+        channelId=channel_id,
+        maxResults=max_results,
+        order="date"
+    )
+    response = request.execute()
+    videos = []
+    for item in response.get("items", []):
+        if item["id"]["kind"] == "youtube#video":
+            videos.append({
+                "video_id": item["id"]["videoId"],
+                "title": item["snippet"]["title"],
+                "description": item["snippet"].get("description", ""),
+                "channel_id": channel_id,
+                "channel_title": channel_title,
+            })
+    return videos

modules/db.py ADDED Viewed

	@@ -0,0 +1,36 @@

+import chromadb
+def get_collection():
+    client = chromadb.PersistentClient(path="./youtube_db")
+    # Ensure fresh collection with correct dimension
+    try:
+        collection = client.get_collection("yt_metadata")
+    except Exception:
+        collection = client.create_collection("yt_metadata")
+    # Check dimension mismatch
+    try:
+        # quick test query
+        collection.query(query_embeddings=[[0.0] * 1536], n_results=1)
+    except Exception:
+        # Delete and recreate with fresh schema
+        client.delete_collection("yt_metadata")
+        collection = client.create_collection("yt_metadata")
+    return collection
+# modules/db.py
+def get_indexed_channels(collection):
+    results = collection.get(include=["metadatas"])
+    channels = {}
+    for meta in results["metadatas"]:
+        cid = meta.get("channel_id")  # ✅ safe
+        cname = meta.get("channel_title", "Unknown Channel")
+        if cid:  # only include if we have a channel_id
+            channels[cid] = cname
+    return channels

modules/indexer.py ADDED Viewed

	@@ -0,0 +1,34 @@

+# modules/indexer.py
+from typing import Dict, List
+from openai import OpenAI
+def index_videos(videos: List[Dict], collection,channel_url : str):
+    client = OpenAI()
+    for vid in videos:
+        text = f"{vid.get('title', '')} - {vid.get('description', '')}"
+        embedding = client.embeddings.create(
+            input=text,
+            model="text-embedding-3-small"
+        ).data[0].embedding
+        # build metadata safely
+        metadata = {
+            "video_id": vid.get("video_id"),
+            "video_title": vid.get("title", ""),
+            "description" : vid.get('description', ''),
+            "channel_url" : channel_url,
+        }
+        # add channel info if available
+        if "channel_id" in vid:
+            metadata["channel_id"] = vid["channel_id"]
+        if "channel_title" in vid:
+            metadata["channel_title"] = vid["channel_title"]
+        collection.add(
+            documents=[text],
+            embeddings=[embedding],
+            metadatas=[metadata],
+            ids=[vid.get("video_id")]
+        )

modules/retriever.py ADDED Viewed

	@@ -0,0 +1,36 @@

+# modules/retriever.py
+from typing import List, Dict
+from openai import OpenAI
+def retrieve_videos(query: str, collection, top_k: int = 3) -> List[Dict]:
+    client = OpenAI()
+    # Create embedding for query
+    embedding = client.embeddings.create(
+        input=query,
+        model="text-embedding-3-small"
+    ).data[0].embedding
+    # Query Chroma
+    results = collection.query(
+        query_embeddings=[embedding],
+        n_results=top_k,
+        include=["metadatas", "documents", "distances"]
+    )
+    # Build list of standardized dicts
+    videos = []
+    metadatas_list = results.get("metadatas", [[]])[0]  # list of metadata dicts
+    documents_list = results.get("documents", [[]])[0]  # list of text
+    distances_list = results.get("distances", [[]])[0]  # optional
+    for idx, meta in enumerate(metadatas_list):
+        videos.append({
+            "video_id": meta.get("video_id", ""),
+            "video_title": meta.get("video_title", meta.get("title", documents_list[idx])),
+            "channel": meta.get("channel", meta.get("channel_title", "")),
+            "description": documents_list[idx] if idx < len(documents_list) else "",
+            "score": distances_list[idx] if idx < len(distances_list) else None
+        })
+    return videos

modules/youtube_utils.py ADDED Viewed

	@@ -0,0 +1,26 @@

+def get_channel_id(youtube, channel_url: str) -> str:
+    """
+    Extract channel ID from a YouTube URL or handle.
+    Supports:
+    - https://www.youtube.com/channel/UCxxxx
+    - https://www.youtube.com/@handle
+    - @handle
+    """
+    # If already a UC... ID
+    if "channel/" in channel_url:
+        return channel_url.split("channel/")[-1].split("/")[0]
+    # If it's a handle (@xyz or full URL)
+    if "@" in channel_url:
+        handle = channel_url.split("@")[-1]
+        request = youtube.channels().list(
+            part="id",
+            forHandle=handle
+        )
+        response = request.execute()
+        return response["items"][0]["id"]
+    if channel_url.startswith("UC"):
+        return channel_url
+    raise ValueError(f"Unsupported channel URL format {channel_url}")

pyproject.toml ADDED Viewed

	@@ -0,0 +1,14 @@

+[project]
+name = "youtube-surfer-ai-agent"
+version = "0.1.0"
+description = "Add your description here"
+readme = "README.md"
+requires-python = ">=3.13"
+dependencies = [
+    "chromadb>=1.0.20",
+    "dotenv>=0.9.9",
+    "google-api-python-client>=2.179.0",
+    "gradio>=5.44.0",
+    "gradio-modal>=0.0.4",
+    "openai>=1.102.0",
+]

requirements.txt ADDED Viewed

	@@ -0,0 +1,369 @@

+# This file was autogenerated by uv via the following command:
+#    uv pip compile pyproject.toml -o requirements.txt
+aiofiles==24.1.0
+    # via gradio
+annotated-types==0.7.0
+    # via pydantic
+anyio==4.10.0
+    # via
+    #   gradio
+    #   httpx
+    #   openai
+    #   starlette
+    #   watchfiles
+attrs==25.3.0
+    # via
+    #   jsonschema
+    #   referencing
+audioop-lts==0.2.2
+    # via gradio
+backoff==2.2.1
+    # via posthog
+bcrypt==4.3.0
+    # via chromadb
+brotli==1.1.0
+    # via gradio
+build==1.3.0
+    # via chromadb
+cachetools==5.5.2
+    # via google-auth
+certifi==2025.8.3
+    # via
+    #   httpcore
+    #   httpx
+    #   kubernetes
+    #   requests
+charset-normalizer==3.4.3
+    # via requests
+chromadb==1.0.20
+    # via youtube-surfer-ai-agent (pyproject.toml)
+click==8.2.1
+    # via
+    #   typer
+    #   uvicorn
+colorama==0.4.6
+    # via
+    #   build
+    #   click
+    #   tqdm
+    #   uvicorn
+coloredlogs==15.0.1
+    # via onnxruntime
+distro==1.9.0
+    # via
+    #   openai
+    #   posthog
+dotenv==0.9.9
+    # via youtube-surfer-ai-agent (pyproject.toml)
+durationpy==0.10
+    # via kubernetes
+fastapi==0.116.1
+    # via gradio
+ffmpy==0.6.1
+    # via gradio
+filelock==3.19.1
+    # via huggingface-hub
+flatbuffers==25.2.10
+    # via onnxruntime
+fsspec==2025.7.0
+    # via
+    #   gradio-client
+    #   huggingface-hub
+google-api-core==2.25.1
+    # via google-api-python-client
+google-api-python-client==2.179.0
+    # via youtube-surfer-ai-agent (pyproject.toml)
+google-auth==2.40.3
+    # via
+    #   google-api-core
+    #   google-api-python-client
+    #   google-auth-httplib2
+    #   kubernetes
+google-auth-httplib2==0.2.0
+    # via google-api-python-client
+googleapis-common-protos==1.70.0
+    # via
+    #   google-api-core
+    #   opentelemetry-exporter-otlp-proto-grpc
+gradio==5.44.0
+    # via
+    #   youtube-surfer-ai-agent (pyproject.toml)
+    #   gradio-modal
+gradio-client==1.12.1
+    # via gradio
+gradio-modal==0.0.4
+    # via youtube-surfer-ai-agent (pyproject.toml)
+groovy==0.1.2
+    # via gradio
+grpcio==1.74.0
+    # via
+    #   chromadb
+    #   opentelemetry-exporter-otlp-proto-grpc
+h11==0.16.0
+    # via
+    #   httpcore
+    #   uvicorn
+httpcore==1.0.9
+    # via httpx
+httplib2==0.22.0
+    # via
+    #   google-api-python-client
+    #   google-auth-httplib2
+httptools==0.6.4
+    # via uvicorn
+httpx==0.28.1
+    # via
+    #   chromadb
+    #   gradio
+    #   gradio-client
+    #   openai
+    #   safehttpx
+huggingface-hub==0.34.4
+    # via
+    #   gradio
+    #   gradio-client
+    #   tokenizers
+humanfriendly==10.0
+    # via coloredlogs
+idna==3.10
+    # via
+    #   anyio
+    #   httpx
+    #   requests
+importlib-metadata==8.7.0
+    # via opentelemetry-api
+importlib-resources==6.5.2
+    # via chromadb
+jinja2==3.1.6
+    # via gradio
+jiter==0.10.0
+    # via openai
+jsonschema==4.25.1
+    # via chromadb
+jsonschema-specifications==2025.4.1
+    # via jsonschema
+kubernetes==33.1.0
+    # via chromadb
+markdown-it-py==4.0.0
+    # via rich
+markupsafe==3.0.2
+    # via
+    #   gradio
+    #   jinja2
+mdurl==0.1.2
+    # via markdown-it-py
+mmh3==5.2.0
+    # via chromadb
+mpmath==1.3.0
+    # via sympy
+numpy==2.3.2
+    # via
+    #   chromadb
+    #   gradio
+    #   onnxruntime
+    #   pandas
+oauthlib==3.3.1
+    # via
+    #   kubernetes
+    #   requests-oauthlib
+onnxruntime==1.22.1
+    # via chromadb
+openai==1.102.0
+    # via youtube-surfer-ai-agent (pyproject.toml)
+opentelemetry-api==1.36.0
+    # via
+    #   chromadb
+    #   opentelemetry-exporter-otlp-proto-grpc
+    #   opentelemetry-sdk
+    #   opentelemetry-semantic-conventions
+opentelemetry-exporter-otlp-proto-common==1.36.0
+    # via opentelemetry-exporter-otlp-proto-grpc
+opentelemetry-exporter-otlp-proto-grpc==1.36.0
+    # via chromadb
+opentelemetry-proto==1.36.0
+    # via
+    #   opentelemetry-exporter-otlp-proto-common
+    #   opentelemetry-exporter-otlp-proto-grpc
+opentelemetry-sdk==1.36.0
+    # via
+    #   chromadb
+    #   opentelemetry-exporter-otlp-proto-grpc
+opentelemetry-semantic-conventions==0.57b0
+    # via opentelemetry-sdk
+orjson==3.11.3
+    # via
+    #   chromadb
+    #   gradio
+overrides==7.7.0
+    # via chromadb
+packaging==25.0
+    # via
+    #   build
+    #   gradio
+    #   gradio-client
+    #   huggingface-hub
+    #   onnxruntime
+pandas==2.3.2
+    # via gradio
+pillow==11.3.0
+    # via gradio
+posthog==5.4.0
+    # via chromadb
+proto-plus==1.26.1
+    # via google-api-core
+protobuf==6.32.0
+    # via
+    #   google-api-core
+    #   googleapis-common-protos
+    #   onnxruntime
+    #   opentelemetry-proto
+    #   proto-plus
+pyasn1==0.6.1
+    # via
+    #   pyasn1-modules
+    #   rsa
+pyasn1-modules==0.4.2
+    # via google-auth
+pybase64==1.4.2
+    # via chromadb
+pydantic==2.11.7
+    # via
+    #   chromadb
+    #   fastapi
+    #   gradio
+    #   openai
+pydantic-core==2.33.2
+    # via pydantic
+pydub==0.25.1
+    # via gradio
+pygments==2.19.2
+    # via rich
+pyparsing==3.2.3
+    # via httplib2
+pypika==0.48.9
+    # via chromadb
+pyproject-hooks==1.2.0
+    # via build
+pyreadline3==3.5.4
+    # via humanfriendly
+python-dateutil==2.9.0.post0
+    # via
+    #   kubernetes
+    #   pandas
+    #   posthog
+python-dotenv==1.1.1
+    # via
+    #   dotenv
+    #   uvicorn
+python-multipart==0.0.20
+    # via gradio
+pytz==2025.2
+    # via pandas
+pyyaml==6.0.2
+    # via
+    #   chromadb
+    #   gradio
+    #   huggingface-hub
+    #   kubernetes
+    #   uvicorn
+referencing==0.36.2
+    # via
+    #   jsonschema
+    #   jsonschema-specifications
+requests==2.32.5
+    # via
+    #   google-api-core
+    #   huggingface-hub
+    #   kubernetes
+    #   posthog
+    #   requests-oauthlib
+requests-oauthlib==2.0.0
+    # via kubernetes
+rich==14.1.0
+    # via
+    #   chromadb
+    #   typer
+rpds-py==0.27.0
+    # via
+    #   jsonschema
+    #   referencing
+rsa==4.9.1
+    # via google-auth
+ruff==0.12.10
+    # via gradio
+safehttpx==0.1.6
+    # via gradio
+semantic-version==2.10.0
+    # via gradio
+shellingham==1.5.4
+    # via typer
+six==1.17.0
+    # via
+    #   kubernetes
+    #   posthog
+    #   python-dateutil
+sniffio==1.3.1
+    # via
+    #   anyio
+    #   openai
+starlette==0.47.3
+    # via
+    #   fastapi
+    #   gradio
+sympy==1.14.0
+    # via onnxruntime
+tenacity==9.1.2
+    # via chromadb
+tokenizers==0.21.4
+    # via chromadb
+tomlkit==0.13.3
+    # via gradio
+tqdm==4.67.1
+    # via
+    #   chromadb
+    #   huggingface-hub
+    #   openai
+typer==0.16.1
+    # via
+    #   chromadb
+    #   gradio
+typing-extensions==4.15.0
+    # via
+    #   chromadb
+    #   fastapi
+    #   gradio
+    #   gradio-client
+    #   huggingface-hub
+    #   openai
+    #   opentelemetry-api
+    #   opentelemetry-exporter-otlp-proto-grpc
+    #   opentelemetry-sdk
+    #   opentelemetry-semantic-conventions
+    #   pydantic
+    #   pydantic-core
+    #   typer
+    #   typing-inspection
+typing-inspection==0.4.1
+    # via pydantic
+tzdata==2025.2
+    # via pandas
+uritemplate==4.2.0
+    # via google-api-python-client
+urllib3==2.5.0
+    # via
+    #   kubernetes
+    #   requests
+uvicorn==0.35.0
+    # via
+    #   chromadb
+    #   gradio
+watchfiles==1.1.0
+    # via uvicorn
+websocket-client==1.8.0
+    # via kubernetes
+websockets==15.0.1
+    # via
+    #   gradio-client
+    #   uvicorn
+zipp==3.23.0
+    # via importlib-metadata

tests/search.py ADDED Viewed

	@@ -0,0 +1,14 @@

+from chromadb import PersistentClient
+from modules.db import get_collection
+from modules.retriever import retrieve_videos
+from dotenv import load_dotenv
+load_dotenv()
+collection = get_collection()
+all_metas = collection.get(include=["metadatas"])["metadatas"]
+print("Sample metadatas:", all_metas[:5])
+print("-------")
+retrieve_videos("Show me some videos that mention Ranganatha.", collection)

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff