--- title: Arxiv Rag Mvp emoji: 🦀 colorFrom: indigo colorTo: pink sdk: gradio sdk_version: 4.37.2 app_file: app.py pinned: false license: cc-by-4.0 --- # arXiv RAG System README ## Key Stakeholder The primary stakeholder for this system is an agentic "System of Agents". This design choice emphasizes the need for modularity, flexibility, and the ability for the system to evolve and improve its own processes. ## Architectural Vision - The system is designed with modularity in mind, using a microservices architecture to allow easy replacement of specific components (libraries, applications, LLMs). - A Hugging Face dataset is used to store metadata and interim results for retrieved documents, crucial for avoiding repetitive and costly processing. - The system captures and tracks the history of document reviews, summarizations, and evaluations performed by the agents. ## Key Architectural Choices 1. **Document Loading**: PyMuPDF for efficient PDF processing with image extraction. 2. **Text Splitting**: RecursiveCharacterTextSplitter for content chunking. 3. **Embedding Model**: OpenAI's text-embedding-3-small for generating embeddings. 4. **Vector Database**: Qdrant for storing and retrieving embeddings. 5. **Retrieval Mechanism**: Similarity search with cosine similarity threshold of 0.5 and k=5. 6. **Language Model**: Llama3 70B via Groq API for response generation. 7. **Orchestration**: LangChain/LCEL for RAG pipeline orchestration. 8. **User Interface**: Gradio for demonstration purposes (note: primary interface is for the agentic system). 9. **Logging and Monitoring**: LangSmith for comprehensive logging and LLM operations monitoring. 10. **Metadata and Results Storage**: Hugging Face dataset for storing document metadata, interim results, and agent review history. ## Sequence Diagrams ### 1. Ingestion Flow - Mermaid Diagrams [Ingestion Flow](ingestion-flow-diagram.mermaid) [Ingestion Flow - Service Diagram](ingestion-flow-service-diagram.mermaid) ### 2. Retrieval Flow - Mermaid Diagrams [Retrieval Flow](retrieval-flow-diagram.mermaid)