Arash-Alborz commited on
Commit
28c74b5
Β·
verified Β·
1 Parent(s): b7f6f88

Rename README Kopie.md to README.md

Browse files
Files changed (2) hide show
  1. README Kopie.md +0 -139
  2. README.md +159 -0
README Kopie.md DELETED
@@ -1,139 +0,0 @@
1
- # Personality Trait Predictor β€” AMIV NLP 2025
2
- University of Antwerp
3
-
4
- This project predicts **Big Five personality traits (OCEAN)** from English text using a combination of:
5
-
6
- - DistilBERT embeddings
7
- - LIWC-style psycholinguistic features
8
- - An ensemble classifier (Random Forest, XGBoost, MLP, SVM)
9
-
10
- The five traits predicted are:
11
- - **Openness**
12
- - **Conscientiousness**
13
- - **Extraversion**
14
- - **Agreeableness**
15
- - **Emotional Stability**
16
-
17
- Each trait is classified as:
18
- - `low`
19
- - `medium`
20
- - `high`
21
-
22
- ---
23
-
24
- ## Features
25
-
26
- - Accepts raw free-form text (e.g., job interview answers)
27
- - Extracts both semantic (BERT) and psycholinguistic (LIWC) features
28
- - Outputs all 5 personality traits using a custom-trained ensemble
29
- - Can be used locally or deployed via Gradio (demo available)
30
-
31
- ---
32
-
33
- ## Quick Usage (Python)
34
-
35
- ```python
36
- from personality_model import PersonalityClassifier
37
-
38
- model = PersonalityClassifier()
39
-
40
- text = "I love exploring new cultures and trying unusual foods. I often seek out unfamiliar ideas and perspectives."
41
-
42
- result = model.predict_all_traits(text)
43
- print(result)
44
- ```
45
-
46
- Expected output:
47
-
48
- ```python
49
- {
50
- "Openness": "high",
51
- "Conscientiousness": "medium",
52
- "Extraversion": "low",
53
- "Agreeableness": "high",
54
- "Emotional stability": "medium"
55
- }
56
- ```
57
-
58
- ---
59
-
60
- ## Project Structure
61
-
62
- ```
63
- β”œβ”€β”€ personality_model.py # PersonalityClassifier pipeline
64
- β”œβ”€β”€ test_personality_model.py # CLI tester
65
- β”œβ”€β”€ feature_extraction/
66
- β”‚ β”œβ”€β”€ __init__.py
67
- β”‚ β”œβ”€β”€ embedding_from_text.py
68
- β”‚ β”œβ”€β”€ liwc_from_text.py
69
- β”œβ”€β”€ models/
70
- β”‚ β”œβ”€β”€ openness_classifier.pkl
71
- β”‚ β”œβ”€β”€ conscientiousness_classifier.pkl
72
- β”‚ β”œβ”€β”€ extraversion_classifier.pkl
73
- β”‚ β”œβ”€β”€ agreeableness_classifier.pkl
74
- β”‚ β”œβ”€β”€ emotional_stability_classifier.pkl
75
- β”‚ β”œβ”€β”€ feature_scaler.pkl
76
- β”‚ β”œβ”€β”€ output.dic
77
- β”œβ”€β”€ requirements.txt
78
- β”œβ”€β”€ README.md
79
- ```
80
-
81
- ---
82
-
83
- ## Modeling Details
84
-
85
- - Ensemble of 4 classifiers (VotingClassifier):
86
- - `RandomForestClassifier`
87
- - `GradientBoostingClassifier`
88
- - `MLPClassifier`
89
- - `SVC (linear)`
90
- - Each trait has a separate classifier trained on combined BERT+LIWC features
91
- - LIWC-style dictionary created from `output.dic`
92
-
93
- ---
94
-
95
- ## Preprocessing & Binning (for original experiments)
96
-
97
- The original project also included regression models and binning rules:
98
-
99
- | Score Range | Bin Label |
100
- |-------------------|-----------|
101
- | 0 ≀ score ≀ 32 | Low |
102
- | 33 ≀ score ≀ 66 | Medium |
103
- | 67 ≀ score ≀ 100 | High |
104
-
105
- These were used to convert continuous personality scores into discrete labels.
106
-
107
- ---
108
-
109
- ## Evaluation Scripts
110
-
111
- - Located in `evaluation/` folder (not shown here)
112
- - Used during development to benchmark model performance
113
- - Final classifiers are saved in `models/`
114
-
115
- ---
116
-
117
- ## Installation & Environment
118
-
119
- Python: `3.9`
120
- Recommended: `conda` environment
121
-
122
- ```bash
123
- conda create -n amiv_nlp_2025 python=3.9
124
- conda activate amiv_nlp_2025
125
- pip install -r requirements.txt
126
- ```
127
-
128
- ---
129
-
130
- ## License
131
-
132
- For research and non-commercial use. Contact the author for other permissions.
133
-
134
- ---
135
-
136
- ## Authors
137
-
138
- Developed by
139
- AMIV NLP 2025 β€” University of Antwerp
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - personality-traits
5
+ - ensemble-model
6
+ - liwc
7
+ - big-five
8
+ - sklearn
9
+ - distilbert
10
+ - psychology
11
+ inference: false
12
+ ---
13
+
14
+ # Personality Trait Predictor (Big Five)
15
+
16
+ This repository provides a machine learning pipeline for predicting the **Big Five personality traits** from free-form **text input**. It combines **DistilBERT embeddings**, **LIWC-style linguistic features**, and a set of **Random Forest classifiers** β€” one for each trait β€” trained on labeled personality data.
17
+
18
+ ### Predicted traits:
19
+ - **Openness**
20
+ - **Conscientiousness**
21
+ - **Extraversion**
22
+ - **Agreeableness**
23
+ - **Emotional Stability**
24
+
25
+ Each trait is predicted as a **categorical label**: `low`, `medium`, or `high`.
26
+
27
+ ---
28
+
29
+ ## How It Works
30
+
31
+ - Text is converted to embeddings using the CLS token from `DistilBERT`.
32
+ - LIWC-like features are computed using a custom dictionary (`output.dic`).
33
+ - Both features are concatenated and passed through a **trait-specific Random Forest classifier**.
34
+ - Predictions are returned as string labels for all five traits.
35
+
36
+ ---
37
+
38
+ ## Example Usage
39
+
40
+ ```python
41
+ from personality_model import PersonalityClassifier
42
+
43
+ model = PersonalityClassifier()
44
+
45
+ text = "I enjoy solving challenging problems and thinking about philosophical questions."
46
+ predictions = model.predict_all_traits(text)
47
+
48
+ print(predictions)
49
+ # Output:
50
+ # {
51
+ # 'Openness': 'high',
52
+ # 'Conscientiousness': 'medium',
53
+ # 'Extraversion': 'low',
54
+ # 'Agreeableness': 'medium',
55
+ # 'Emotional stability': 'low'
56
+ # }
57
+ ```
58
+
59
+ ---
60
+
61
+ ## Installation
62
+
63
+ Clone the repository and install dependencies:
64
+
65
+ ```bash
66
+ git clone https://huggingface.co/Arash-Alborz/personality-trait-predictor
67
+ cd personality-trait-predictor
68
+
69
+ # Create a conda environment
70
+ conda create -n personality_env python=3.9
71
+ conda activate personality_env
72
+
73
+ # Install dependencies
74
+ pip install -r requirements.txt
75
+ ```
76
+
77
+ ---
78
+
79
+ ## Project Structure
80
+
81
+ ```
82
+ personality-trait-predictor/
83
+ β”œβ”€β”€ personality_model.py # Main class for prediction
84
+ β”œβ”€β”€ requirements.txt # Dependencies
85
+ β”œβ”€β”€ README.md # Project description
86
+ β”œβ”€β”€ .gitattributes # Git LFS tracking
87
+ β”œβ”€β”€ models/
88
+ β”‚ β”œβ”€β”€ feature_scaler.pkl # StandardScaler for feature scaling
89
+ β”‚ β”œβ”€β”€ output.dic # LIWC-style dictionary
90
+ β”‚ β”œβ”€β”€ openness_classifier.pkl # Classifier for Openness
91
+ β”‚ β”œβ”€β”€ conscientiousness_classifier.pkl
92
+ β”‚ β”œβ”€β”€ extraversion_classifier.pkl
93
+ β”‚ β”œβ”€β”€ agreeableness_classifier.pkl
94
+ β”‚ β”œβ”€β”€ emotional_stability_classifier.pkl
95
+ β”œβ”€β”€ feature_extraction/
96
+ β”‚ β”œβ”€β”€ __init__.py
97
+ β”‚ β”œβ”€β”€ embedding_from_text.py # BERT embedding code
98
+ β”‚ β”œβ”€β”€ liwc_from_text.py # LIWC feature extraction
99
+ β”‚ β”œβ”€β”€ pipeline.py # Combined feature pipeline
100
+ ```
101
+
102
+ ---
103
+
104
+ ## Model Details
105
+
106
+ - **Embeddings**: `DistilBERT` (CLS token from `distilbert-base-cased-distilled-squad`)
107
+ - **Linguistic Features**: Word count vectors from a custom LIWC dictionary
108
+ - **Classifier**: One `RandomForestClassifier` per trait, tuned with custom hyperparameters
109
+ - **Scaling**: Features are scaled using `StandardScaler`
110
+ - **Labels**: Traits are categorized into `low`, `medium`, or `high`
111
+
112
+ ---
113
+
114
+ ## Training & Evaluation
115
+
116
+ - Each trait classifier was trained on a labeled dataset using combined BERT+LIWC features.
117
+ - Validation was performed on a separate set simulating job interview answers.
118
+ - Random Forest hyperparameters (e.g., `n_estimators`, `max_depth`) were manually optimized per trait for best F1-score.
119
+
120
+ ---
121
+
122
+ ## Notes
123
+
124
+ - The model does **not** use Hugging Face’s `pipeline()` interface because it integrates custom feature engineering steps.
125
+ - You can import `PersonalityClassifier` directly to use the model.
126
+
127
+ ---
128
+
129
+ ## Requirements
130
+
131
+ Install with:
132
+
133
+ ```bash
134
+ pip install -r requirements.txt
135
+ ```
136
+
137
+ Dependencies include:
138
+
139
+ - numpy
140
+ - pandas
141
+ - scikit-learn
142
+ - torch
143
+ - transformers
144
+ - joblib
145
+ - tqdm
146
+ - gradio (optional for UI testing)
147
+
148
+ ---
149
+
150
+ ## Author
151
+
152
+ University of Antwerp – AMIV NLP 2025
153
+ Project developed as part of NLP Course.
154
+
155
+ ---
156
+
157
+ ## License
158
+
159
+ This project is licensed under the MIT License.