arxiv-rag-mvp / ingestion_flow_service_diagram.mermaid
donb-hf's picture
update services
84deff7
sequenceDiagram
participant API as API (FastAPI)
participant DI as Data Ingestion Service
participant AM as ArXiv Metadata Fetcher
participant PL as PDF Loader (PyMuPDF)
participant TS as Text Splitter
participant EM as Embedding Model (OpenAI)
participant VDB as Vector Database (Qdrant)
participant HF as Hugging Face Dataset
API->>DI: POST /ingest (query, max_results)
DI->>AM: fetch_arxiv_metadata(query, max_results)
AM-->>DI: Return metadata list
alt Successful metadata fetch
loop For each metadata item
DI->>PL: process_pdf(pdf_url)
alt Successful PDF processing
PL-->>DI: Return PDF text
DI->>TS: split_text(pdf_text)
TS-->>DI: Return text chunks
loop For each chunk
DI->>EM: embed_query(chunk)
EM-->>DI: Return embedding
DI->>VDB: add_texts(chunk, embedding)
DI->>HF: Add chunk and metadata
end
else PDF processing error
PL-->>DI: Raise exception
DI->>DI: Log error and continue
end
end
DI-->>API: Return ingestion result
else Metadata fetch error
AM-->>DI: Raise exception
DI-->>API: Return error message
end
Note over API,HF: Logging at each step