--- license: mit title: 🇫🇷 Assistant RH — RAG Chatbot sdk: gradio emoji: 📚 colorFrom: indigo colorTo: purple app_file: app.py pinned: true short_description: 👉 RAG-powered AI assistant for French Human Resources tags: - gradio - rag - faiss - anthropic - claude - openai - hr - human-resources - law - france - french - chatbot thumbnail: >- https://cdn-uploads.huggingface.co/production/uploads/6668057ef7604601278857f5/JeivLn409aMRCqx6RwO2J.png --- # 🇫🇷 RAG-powered HR Assistant 👉 **An AI assistant specialised in French Human Resources** Built with **Retrieval-Augmented Generation (RAG)** on top of **official public datasets**. It retrieves trusted information, generates concise answers, and always cites its sources. 🚀 **Live demo on Hugging Face** : [![Hugging Face Space](https://img.shields.io/badge/🤗-HuggingFace%20Space-blue)](https://huggingface.co/spaces/edouardfoussier/rag-rh-assistant) ![App Screenshot](assets/screenshot2.png) --- ## ✨ What is this? This project is an **AI assistant** for HR topics in the **French labor law and public administration HR practices**. It combines **retrieval** over trusted sources with **LLM synthesis**, and cites its sources. **Key features:** - 🤖 **Multi-LLM support**: Choose between OpenAI or Anthropic (Claude) models - 📚 **Trusted sources**: Built on official French government datasets - 🔍 **Hybrid retrieval**: Semantic + full-text search for precise results - 📊 **Evaluation-driven**: Custom metrics to measure and improve performance **Tech stack:** - UI: **Gradio** - Retrieval: **FAISS** (fallback: NumPy) + PostgreSQL full-text search - Embeddings: **HF Inference API** - LLM: **Anthropic** or **OpenAI** (BYO API Key) --- ## 📚 Datasets & Attribution This space relies on **public HR datasets** curated by [**AgentPublic**](https://huggingface.co/datasets/AgentPublic): - [Service-Public dataset](https://huggingface.co/datasets/AgentPublic/service-public) - [Travail-Emploi dataset](https://huggingface.co/datasets/AgentPublic/travail-emploi) For this project, I built **cleaned and filtered derivatives** hosted under my profile: - [edouardfoussier/service-public-filtered](https://huggingface.co/datasets/edouardfoussier/service-public-filtered) - [edouardfoussier/travail-emploi-clean](https://huggingface.co/datasets/edouardfoussier/travail-emploi-clean) --- ## ⚙️ How it works 1. **Question** → User asks in French (e.g., "DPAE : quelles obligations ?"). 2. **Retrieve** → Hybrid search (semantic + full-text) finds relevant passages from the datasets. 3. **Synthesize** → Your chosen LLM (Anthropic or OpenAI) writes a concise, factual answer with citations `[1], [2], …`. 4. **Explain** → The "Sources" panel shows the original articles used for answer generation. --- ## 🔑 BYOK (Bring Your Own Key) The app supports **Anthropic (Claude)** and **OpenAI** models. Your API key is never stored; it's used in-session only for secure, private inference. **Supported models:** - Anthropic: `claude-sonnet-4-5`, `claude-opus-4-1`, `claude-haiku-4-5` - OpenAI: `gpt-4o`, `gpt-4o-mini`, `gpt-4-turbo`, `gpt-3.5-turbo` --- ## 🧩 Configuration notes - FAISS is used when available; otherwise we fall back to NumPy dot-product search. - The retriever loads vectors from the datasets and keeps a compressed cache at runtime (`/tmp/rag_index.npz`) to speed up cold starts. - You can change the Top-K slider in the UI; it controls both retrieval and the number of passages given to the LLM. - Provider and model selection in the sidebar allow you to compare different LLMs. --- ## 🚀 Run locally ### 1) Clone & install git clone https://huggingface.co/spaces/edouardfoussier/rag-rh-assistant cd rag-rh-assistant python -m venv .venv source .venv/bin/activate pip install -r requirements.txt### 2) Configure environment Key env vars: **Required:** - `HF_API_TOKEN` → required for embeddings via HF Inference API **Optional (for default provider/model):** - `LLM_PROVIDER` → `openai` or `anthropic` (default: `openai`) - `LLM_MODEL` → e.g., `gpt-4o-mini` or `claude-sonnet-4-5` - `ANTHROPIC_API_KEY` → your Anthropic key (or enter in UI) - `OPENAI_API_KEY` → your OpenAI key (or enter in UI) **Other:** - `HF_EMBEDDINGS_MODEL` → defaults to `BAAI/bge-m3` - `EMBED_COL` → name of the embedding column (defaults to `embeddings_bge-m3`) - `LLM_BASE_URL` → default `https://api.openai.com/v1` ### 3) Launch python app.pyOpen http://127.0.0.1:7860 and select your preferred LLM provider in the sidebar. --- ## 📊 Roadmap - ✅ Multi-LLM backends (Anthropic + OpenAI) - ✅ Hybrid retrieval (semantic + full-text) - ✅ Custom evaluation metrics - 🔜 Reranking (cross-encoder) - 🔜 Multi-turn conversation memory - 🔜 More datasets (other ministries, legal codes) - 🔜 Advanced hallucination detection --- ## 🙌 Credits - Original data: [**AgentPublic**](https://huggingface.co/datasets/AgentPublic) - Built with: Hugging Face Spaces, Gradio, FAISS, Anthropic, OpenAI