edouardfoussier commited on
Commit
d2de949
Β·
verified Β·
1 Parent(s): f7653ab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -14
README.md CHANGED
@@ -1,27 +1,112 @@
1
  ---
2
  license: mit
3
- title: RAG RH (Gradio)
4
  sdk: gradio
5
  emoji: πŸ’»
6
  colorFrom: blue
7
  colorTo: indigo
8
  app_file: app.py
9
- pinned: false
 
10
  ---
11
 
12
- # RAG RH (Gradio)
13
 
14
- - Embeddings via **HF Inference API** (`feature-extraction`) with `HF_EMBEDDINGS_MODEL` (default `BAAI/bge-m3`)
15
- - Datasets:
16
- - `edouardfoussier/travail-emploi-clean`
17
- - `edouardfoussier/service-public-filtered`
18
 
19
- ## Space Variables
20
 
21
- Set in **Settings β†’ Variables**:
22
 
23
- - `HF_API_TOKEN` (Write token) β€” required
24
- - Optional:
25
- - `HF_EMBEDDINGS_MODEL` (default `BAAI/bge-m3`)
26
- - `EMBED_COL` (default `embeddings_bge-m3`)
27
- - `MAX_ROWS_PER_DATASET` (e.g., `2000` to cap memory during testing)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ title: πŸ‡«πŸ‡· Assistant RH β€” RAG Chatbot
4
  sdk: gradio
5
  emoji: πŸ’»
6
  colorFrom: blue
7
  colorTo: indigo
8
  app_file: app.py
9
+ pinned: true
10
+ short_description: πŸ‘‰ RAG-powered AI assistant for French Human Resources
11
  ---
12
 
13
+ # πŸ‡«πŸ‡· RAG-powered HR Assistant
14
 
15
+ πŸ‘‰ **An AI assistant specialised in French Human Resources, powered by Retrieval-Augmented Generation (RAG) and based on official public datasets.**
 
 
 
16
 
17
+ [![Hugging Face Space](https://img.shields.io/badge/πŸ€—-HuggingFace%20Space-blue)](https://huggingface.co/spaces/edouardfoussier/rag-rh-assistant)
18
 
19
+ ![App Screenshot](assets/screenshot.png)
20
 
21
+ ---
22
+
23
+ ## ✨ What is this?
24
+
25
+ This project is an **AI assistant** for HR topics in the **French labor law and public administration HR practices**.
26
+ It combines **retrieval** over trusted sources with **LLM synthesis**, and cites its sources.
27
+
28
+ - UI: **Gradio**
29
+ - Retrieval: **FAISS** (fallback: NumPy)
30
+ - Embeddings: **HF Inference API**
31
+ - LLM: **OpenAI** (BYO API Key)
32
+
33
+ ---
34
+
35
+ ## πŸ“š Datasets & Attribution
36
+
37
+ This space relies on **public HR datasets** curated by [**AgentPublic**](https://huggingface.co/datasets/AgentPublic):
38
+ - [Service-Public dataset](https://huggingface.co/datasets/AgentPublic/service-public)
39
+ - [Travail-Emploi dataset](https://huggingface.co/datasets/AgentPublic/travail-emploi)
40
+
41
+ For this project, I built **cleaned and filtered derivatives** hosted under my profile:
42
+ - [edouardfoussier/service-public-filtered](https://huggingface.co/datasets/edouardfoussier/service-public-filtered)
43
+ - [edouardfoussier/travail-emploi-clean](https://huggingface.co/datasets/edouardfoussier/travail-emploi-clean)
44
+
45
+ ---
46
+
47
+ ## βš™οΈ How it works
48
+
49
+ 1. **Question** β†’ User asks in French (e.g., β€œDPAE : quelles obligations ?”).
50
+ 2. **Retrieve** β†’ FAISS searches semantic vectors from the datasets.
51
+ 3. **Synthesize** β†’ The LLM writes a concise, factual answer with citations `[1], [2], …`.
52
+ 4. **Explain** β†’ The β€œSources” panel shows the original articles used for answer generation
53
+
54
+ ---
55
+
56
+ ## πŸ”‘ BYOK
57
+
58
+ The app never stores your OpenAI key; it’s used in-session only.
59
+
60
+ ---
61
+
62
+ ## 🧩 Configuration notes
63
+
64
+ - FAISS is used when available; otherwise we fall back to NumPy dot-product search.
65
+ - The retriever loads vectors from the datasets and keeps a compressed cache at runtime (/tmp/rag_index.npz) to speed up cold starts.
66
+ - You can change the Top-K slider in the UI; it controls both retrieval and the number of passages given to the LLM.
67
+
68
+ ---
69
+
70
+ ## πŸš€ Run locally
71
+
72
+ ### 1) Clone & install
73
+ ```bash
74
+ git clone https://huggingface.co/spaces/edouardfoussier/rag-rh-assistant
75
+ cd rag-rh-assistant
76
+ python -m venv .venv
77
+ source .venv/bin/activate
78
+ pip install -r requirements.txt
79
+ ```
80
+
81
+ ### 2) Configure environment
82
+ Key env vars:
83
+ - HF_API_TOKEN β†’ required for embeddings via HF Inference API
84
+ - HF_EMBEDDINGS_MODEL β†’ defaults to BAAI/bge-m3
85
+ - EMBED_COL β†’ name of the embedding column in the dataset (defaults to embeddings_bge-m3)
86
+ - OPENAI_API_KEY β†’ optional at startup (you can also enter it in the UI)
87
+ - LLM_MODEL β†’ e.g. gpt-4o-mini (configurable)
88
+ - LLM_BASE_URL β†’ default https://api.openai.com/v1
89
+
90
+ ### 3) Launch
91
+ ```bash
92
+ python app.py
93
+ ```
94
+
95
+ Open http://127.0.0.1:7860 and enter your OpenAI API key in the sidebar (or set it in .env).
96
+
97
+ ---
98
+
99
+ ## πŸ“Š Roadmap
100
+
101
+ - Reranking (cross-encoder)
102
+ - Multi-turn memory
103
+ - More datasets (other ministries, codes)
104
+ - Hallucination checks & eval (faithfulness)
105
+ - Multi-LLM backends
106
+
107
+ ---
108
+
109
+ ## πŸ™Œ Credits
110
+
111
+ β€’ Original data: [**AgentPublic**](https://huggingface.co/datasets/AgentPublic)
112
+ β€’ Built with: Hugging Face Spaces, Gradio, FAISS, OpenAI