Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -1,27 +1,112 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
-
title:
|
4 |
sdk: gradio
|
5 |
emoji: π»
|
6 |
colorFrom: blue
|
7 |
colorTo: indigo
|
8 |
app_file: app.py
|
9 |
-
pinned:
|
|
|
10 |
---
|
11 |
|
12 |
-
# RAG
|
13 |
|
14 |
-
|
15 |
-
- Datasets:
|
16 |
-
- `edouardfoussier/travail-emploi-clean`
|
17 |
-
- `edouardfoussier/service-public-filtered`
|
18 |
|
19 |
-
|
20 |
|
21 |
-
|
22 |
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
title: π«π· Assistant RH β RAG Chatbot
|
4 |
sdk: gradio
|
5 |
emoji: π»
|
6 |
colorFrom: blue
|
7 |
colorTo: indigo
|
8 |
app_file: app.py
|
9 |
+
pinned: true
|
10 |
+
short_description: π RAG-powered AI assistant for French Human Resources
|
11 |
---
|
12 |
|
13 |
+
# π«π· RAG-powered HR Assistant
|
14 |
|
15 |
+
π **An AI assistant specialised in French Human Resources, powered by Retrieval-Augmented Generation (RAG) and based on official public datasets.**
|
|
|
|
|
|
|
16 |
|
17 |
+
[](https://huggingface.co/spaces/edouardfoussier/rag-rh-assistant)
|
18 |
|
19 |
+

|
20 |
|
21 |
+
---
|
22 |
+
|
23 |
+
## β¨ What is this?
|
24 |
+
|
25 |
+
This project is an **AI assistant** for HR topics in the **French labor law and public administration HR practices**.
|
26 |
+
It combines **retrieval** over trusted sources with **LLM synthesis**, and cites its sources.
|
27 |
+
|
28 |
+
- UI: **Gradio**
|
29 |
+
- Retrieval: **FAISS** (fallback: NumPy)
|
30 |
+
- Embeddings: **HF Inference API**
|
31 |
+
- LLM: **OpenAI** (BYO API Key)
|
32 |
+
|
33 |
+
---
|
34 |
+
|
35 |
+
## π Datasets & Attribution
|
36 |
+
|
37 |
+
This space relies on **public HR datasets** curated by [**AgentPublic**](https://huggingface.co/datasets/AgentPublic):
|
38 |
+
- [Service-Public dataset](https://huggingface.co/datasets/AgentPublic/service-public)
|
39 |
+
- [Travail-Emploi dataset](https://huggingface.co/datasets/AgentPublic/travail-emploi)
|
40 |
+
|
41 |
+
For this project, I built **cleaned and filtered derivatives** hosted under my profile:
|
42 |
+
- [edouardfoussier/service-public-filtered](https://huggingface.co/datasets/edouardfoussier/service-public-filtered)
|
43 |
+
- [edouardfoussier/travail-emploi-clean](https://huggingface.co/datasets/edouardfoussier/travail-emploi-clean)
|
44 |
+
|
45 |
+
---
|
46 |
+
|
47 |
+
## βοΈ How it works
|
48 |
+
|
49 |
+
1. **Question** β User asks in French (e.g., βDPAE : quelles obligations ?β).
|
50 |
+
2. **Retrieve** β FAISS searches semantic vectors from the datasets.
|
51 |
+
3. **Synthesize** β The LLM writes a concise, factual answer with citations `[1], [2], β¦`.
|
52 |
+
4. **Explain** β The βSourcesβ panel shows the original articles used for answer generation
|
53 |
+
|
54 |
+
---
|
55 |
+
|
56 |
+
## π BYOK
|
57 |
+
|
58 |
+
The app never stores your OpenAI key; itβs used in-session only.
|
59 |
+
|
60 |
+
---
|
61 |
+
|
62 |
+
## π§© Configuration notes
|
63 |
+
|
64 |
+
- FAISS is used when available; otherwise we fall back to NumPy dot-product search.
|
65 |
+
- The retriever loads vectors from the datasets and keeps a compressed cache at runtime (/tmp/rag_index.npz) to speed up cold starts.
|
66 |
+
- You can change the Top-K slider in the UI; it controls both retrieval and the number of passages given to the LLM.
|
67 |
+
|
68 |
+
---
|
69 |
+
|
70 |
+
## π Run locally
|
71 |
+
|
72 |
+
### 1) Clone & install
|
73 |
+
```bash
|
74 |
+
git clone https://huggingface.co/spaces/edouardfoussier/rag-rh-assistant
|
75 |
+
cd rag-rh-assistant
|
76 |
+
python -m venv .venv
|
77 |
+
source .venv/bin/activate
|
78 |
+
pip install -r requirements.txt
|
79 |
+
```
|
80 |
+
|
81 |
+
### 2) Configure environment
|
82 |
+
Key env vars:
|
83 |
+
- HF_API_TOKEN β required for embeddings via HF Inference API
|
84 |
+
- HF_EMBEDDINGS_MODEL β defaults to BAAI/bge-m3
|
85 |
+
- EMBED_COL β name of the embedding column in the dataset (defaults to embeddings_bge-m3)
|
86 |
+
- OPENAI_API_KEY β optional at startup (you can also enter it in the UI)
|
87 |
+
- LLM_MODEL β e.g. gpt-4o-mini (configurable)
|
88 |
+
- LLM_BASE_URL β default https://api.openai.com/v1
|
89 |
+
|
90 |
+
### 3) Launch
|
91 |
+
```bash
|
92 |
+
python app.py
|
93 |
+
```
|
94 |
+
|
95 |
+
Open http://127.0.0.1:7860 and enter your OpenAI API key in the sidebar (or set it in .env).
|
96 |
+
|
97 |
+
---
|
98 |
+
|
99 |
+
## π Roadmap
|
100 |
+
|
101 |
+
- Reranking (cross-encoder)
|
102 |
+
- Multi-turn memory
|
103 |
+
- More datasets (other ministries, codes)
|
104 |
+
- Hallucination checks & eval (faithfulness)
|
105 |
+
- Multi-LLM backends
|
106 |
+
|
107 |
+
---
|
108 |
+
|
109 |
+
## π Credits
|
110 |
+
|
111 |
+
β’ Original data: [**AgentPublic**](https://huggingface.co/datasets/AgentPublic)
|
112 |
+
β’ Built with: Hugging Face Spaces, Gradio, FAISS, OpenAI
|