sequenceDiagram participant API as API (FastAPI) participant DI as Data Ingestion Service participant AM as ArXiv Metadata Fetcher participant PL as PDF Loader (PyMuPDF) participant TS as Text Splitter participant EM as Embedding Model (OpenAI) participant VDB as Vector Database (Qdrant) participant HF as Hugging Face Dataset API->>DI: POST /ingest (query, max_results) DI->>AM: fetch_arxiv_metadata(query, max_results) AM-->>DI: Return metadata list alt Successful metadata fetch loop For each metadata item DI->>PL: process_pdf(pdf_url) alt Successful PDF processing PL-->>DI: Return PDF text DI->>TS: split_text(pdf_text) TS-->>DI: Return text chunks loop For each chunk DI->>EM: embed_query(chunk) EM-->>DI: Return embedding DI->>VDB: add_texts(chunk, embedding) DI->>HF: Add chunk and metadata end else PDF processing error PL-->>DI: Raise exception DI->>DI: Log error and continue end end DI-->>API: Return ingestion result else Metadata fetch error AM-->>DI: Raise exception DI-->>API: Return error message end Note over API,HF: Logging at each step