Spaces:
Sleeping
Sleeping
File size: 3,955 Bytes
b31e197 d2de949 a1b3db3 5154308 a1b3db3 5154308 a1b3db3 4e442ed a1b3db3 d2de949 827ad60 b5772af 11c9cf2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
---
license: mit
title: π«π· Assistant RH β RAG Chatbot
sdk: gradio
emoji: π
colorFrom: indigo
colorTo: purple
app_file: app.py
pinned: true
short_description: π RAG-powered AI assistant for French Human Resources
tags:
- gradio
- rag
- faiss
- openai
- hr
- human-resources
- law
- france
- french
- chatbot
thumbnail: >-
https://cdn-uploads.huggingface.co/production/uploads/6668057ef7604601278857f5/JeivLn409aMRCqx6RwO2J.png
---
# π«π· RAG-powered HR Assistant
π **An AI assistant specialised in French Human Resources**
Built with **Retrieval-Augmented Generation (RAG)** on top of **official public datasets**.
It retrieves trusted information, generates concise answers, and always cites its sources.
π **Live demo on Hugging Face** : [](https://huggingface.co/spaces/edouardfoussier/rag-rh-assistant)

---
## β¨ What is this?
This project is an **AI assistant** for HR topics in the **French labor law and public administration HR practices**.
It combines **retrieval** over trusted sources with **LLM synthesis**, and cites its sources.
- UI: **Gradio**
- Retrieval: **FAISS** (fallback: NumPy)
- Embeddings: **HF Inference API**
- LLM: **OpenAI** (BYO API Key)
---
## π Datasets & Attribution
This space relies on **public HR datasets** curated by [**AgentPublic**](https://huggingface.co/datasets/AgentPublic):
- [Service-Public dataset](https://huggingface.co/datasets/AgentPublic/service-public)
- [Travail-Emploi dataset](https://huggingface.co/datasets/AgentPublic/travail-emploi)
For this project, I built **cleaned and filtered derivatives** hosted under my profile:
- [edouardfoussier/service-public-filtered](https://huggingface.co/datasets/edouardfoussier/service-public-filtered)
- [edouardfoussier/travail-emploi-clean](https://huggingface.co/datasets/edouardfoussier/travail-emploi-clean)
---
## βοΈ How it works
1. **Question** β User asks in French (e.g., βDPAE : quelles obligations ?β).
2. **Retrieve** β FAISS searches semantic vectors from the datasets.
3. **Synthesize** β The LLM writes a concise, factual answer with citations `[1], [2], β¦`.
4. **Explain** β The βSourcesβ panel shows the original articles used for answer generation
---
## π BYOK
The app never stores your OpenAI key; itβs used in-session only.
---
## π§© Configuration notes
- FAISS is used when available; otherwise we fall back to NumPy dot-product search.
- The retriever loads vectors from the datasets and keeps a compressed cache at runtime (/tmp/rag_index.npz) to speed up cold starts.
- You can change the Top-K slider in the UI; it controls both retrieval and the number of passages given to the LLM.
---
## π Run locally
### 1) Clone & install
```bash
git clone https://huggingface.co/spaces/edouardfoussier/rag-rh-assistant
cd rag-rh-assistant
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
### 2) Configure environment
Key env vars:
- HF_API_TOKEN β required for embeddings via HF Inference API
- HF_EMBEDDINGS_MODEL β defaults to BAAI/bge-m3
- EMBED_COL β name of the embedding column in the dataset (defaults to embeddings_bge-m3)
- OPENAI_API_KEY β optional at startup (you can also enter it in the UI)
- LLM_MODEL β e.g. gpt-4o-mini (configurable)
- LLM_BASE_URL β default https://api.openai.com/v1
### 3) Launch
```bash
python app.py
```
Open http://127.0.0.1:7860 and enter your OpenAI API key in the sidebar (or set it in .env).
---
## π Roadmap
- Reranking (cross-encoder)
- Multi-turn memory
- More datasets (other ministries, codes)
- Hallucination checks & eval (faithfulness)
- Multi-LLM backends
---
## π Credits
- Original data: [**AgentPublic**](https://huggingface.co/datasets/AgentPublic)
- Built with: Hugging Face Spaces, Gradio, FAISS, OpenAI
|