edouardfoussier commited on
Commit
8f44b27
Β·
2 Parent(s): 8853ea0 4e442ed

Merge remote changes, keep local app.py improvements

Browse files
Files changed (3) hide show
  1. README.md +115 -17
  2. assets/screenshot.png +3 -0
  3. assets/screenshot2.png +3 -0
README.md CHANGED
@@ -1,27 +1,125 @@
1
  ---
2
  license: mit
3
- title: RAG RH (Gradio)
4
  sdk: gradio
5
- emoji: πŸ’»
6
- colorFrom: blue
7
- colorTo: indigo
8
  app_file: app.py
9
- pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- # RAG RH (Gradio)
13
 
14
- - Embeddings via **HF Inference API** (`feature-extraction`) with `HF_EMBEDDINGS_MODEL` (default `BAAI/bge-m3`)
15
- - Datasets:
16
- - `edouardfoussier/travail-emploi-clean`
17
- - `edouardfoussier/service-public-filtered`
18
 
19
- ## Space Variables
20
 
21
- Set in **Settings β†’ Variables**:
22
 
23
- - `HF_API_TOKEN` (Write token) β€” required
24
- - Optional:
25
- - `HF_EMBEDDINGS_MODEL` (default `BAAI/bge-m3`)
26
- - `EMBED_COL` (default `embeddings_bge-m3`)
27
- - `MAX_ROWS_PER_DATASET` (e.g., `2000` to cap memory during testing)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ title: πŸ‡«πŸ‡· Assistant RH β€” RAG Chatbot
4
  sdk: gradio
5
+ emoji: πŸ“š
6
+ colorFrom: indigo
7
+ colorTo: purple
8
  app_file: app.py
9
+ pinned: true
10
+ short_description: πŸ‘‰ RAG-powered AI assistant for French Human Resources
11
+ tags:
12
+ - gradio
13
+ - rag
14
+ - faiss
15
+ - openai
16
+ - hr
17
+ - human-resources
18
+ - law
19
+ - france
20
+ - french
21
+ - chatbot
22
+ thumbnail: >-
23
+ https://cdn-uploads.huggingface.co/production/uploads/6668057ef7604601278857f5/JeivLn409aMRCqx6RwO2J.png
24
  ---
25
 
26
+ # πŸ‡«πŸ‡· RAG-powered HR Assistant
27
 
28
+ πŸ‘‰ **An AI assistant specialised in French Human Resources, powered by Retrieval-Augmented Generation (RAG) and based on official public datasets.**
 
 
 
29
 
30
+ [![Hugging Face Space](https://img.shields.io/badge/πŸ€—-HuggingFace%20Space-blue)](https://huggingface.co/spaces/edouardfoussier/rag-rh-assistant)
31
 
32
+ ![App Screenshot](assets/screenshot2.png)
33
 
34
+ ---
35
+
36
+ ## ✨ What is this?
37
+
38
+ This project is an **AI assistant** for HR topics in the **French labor law and public administration HR practices**.
39
+ It combines **retrieval** over trusted sources with **LLM synthesis**, and cites its sources.
40
+
41
+ - UI: **Gradio**
42
+ - Retrieval: **FAISS** (fallback: NumPy)
43
+ - Embeddings: **HF Inference API**
44
+ - LLM: **OpenAI** (BYO API Key)
45
+
46
+ ---
47
+
48
+ ## πŸ“š Datasets & Attribution
49
+
50
+ This space relies on **public HR datasets** curated by [**AgentPublic**](https://huggingface.co/datasets/AgentPublic):
51
+ - [Service-Public dataset](https://huggingface.co/datasets/AgentPublic/service-public)
52
+ - [Travail-Emploi dataset](https://huggingface.co/datasets/AgentPublic/travail-emploi)
53
+
54
+ For this project, I built **cleaned and filtered derivatives** hosted under my profile:
55
+ - [edouardfoussier/service-public-filtered](https://huggingface.co/datasets/edouardfoussier/service-public-filtered)
56
+ - [edouardfoussier/travail-emploi-clean](https://huggingface.co/datasets/edouardfoussier/travail-emploi-clean)
57
+
58
+ ---
59
+
60
+ ## βš™οΈ How it works
61
+
62
+ 1. **Question** β†’ User asks in French (e.g., β€œDPAE : quelles obligations ?”).
63
+ 2. **Retrieve** β†’ FAISS searches semantic vectors from the datasets.
64
+ 3. **Synthesize** β†’ The LLM writes a concise, factual answer with citations `[1], [2], …`.
65
+ 4. **Explain** β†’ The β€œSources” panel shows the original articles used for answer generation
66
+
67
+ ---
68
+
69
+ ## πŸ”‘ BYOK
70
+
71
+ The app never stores your OpenAI key; it’s used in-session only.
72
+
73
+ ---
74
+
75
+ ## 🧩 Configuration notes
76
+
77
+ - FAISS is used when available; otherwise we fall back to NumPy dot-product search.
78
+ - The retriever loads vectors from the datasets and keeps a compressed cache at runtime (/tmp/rag_index.npz) to speed up cold starts.
79
+ - You can change the Top-K slider in the UI; it controls both retrieval and the number of passages given to the LLM.
80
+
81
+ ---
82
+
83
+ ## πŸš€ Run locally
84
+
85
+ ### 1) Clone & install
86
+ ```bash
87
+ git clone https://huggingface.co/spaces/edouardfoussier/rag-rh-assistant
88
+ cd rag-rh-assistant
89
+ python -m venv .venv
90
+ source .venv/bin/activate
91
+ pip install -r requirements.txt
92
+ ```
93
+
94
+ ### 2) Configure environment
95
+ Key env vars:
96
+ - HF_API_TOKEN β†’ required for embeddings via HF Inference API
97
+ - HF_EMBEDDINGS_MODEL β†’ defaults to BAAI/bge-m3
98
+ - EMBED_COL β†’ name of the embedding column in the dataset (defaults to embeddings_bge-m3)
99
+ - OPENAI_API_KEY β†’ optional at startup (you can also enter it in the UI)
100
+ - LLM_MODEL β†’ e.g. gpt-4o-mini (configurable)
101
+ - LLM_BASE_URL β†’ default https://api.openai.com/v1
102
+
103
+ ### 3) Launch
104
+ ```bash
105
+ python app.py
106
+ ```
107
+
108
+ Open http://127.0.0.1:7860 and enter your OpenAI API key in the sidebar (or set it in .env).
109
+
110
+ ---
111
+
112
+ ## πŸ“Š Roadmap
113
+
114
+ - Reranking (cross-encoder)
115
+ - Multi-turn memory
116
+ - More datasets (other ministries, codes)
117
+ - Hallucination checks & eval (faithfulness)
118
+ - Multi-LLM backends
119
+
120
+ ---
121
+
122
+ ## πŸ™Œ Credits
123
+
124
+ - Original data: [**AgentPublic**](https://huggingface.co/datasets/AgentPublic)
125
+ - Built with: Hugging Face Spaces, Gradio, FAISS, OpenAI
assets/screenshot.png ADDED

Git LFS Details

  • SHA256: 120dba952bd6f88bf6741c7bdf5b0530dcc5ba3ad9c412b5249454869bb72c62
  • Pointer size: 131 Bytes
  • Size of remote file: 237 kB
assets/screenshot2.png ADDED

Git LFS Details

  • SHA256: 0aaefe7403bc20eb473b247ab749efd69b4577470f8a2608eb1ac3881ebd445f
  • Pointer size: 131 Bytes
  • Size of remote file: 331 kB