Arash-Alborz
/

personality-trait-predictor

@@ -1,139 +0,0 @@
-# Personality Trait Predictor — AMIV NLP 2025
-University of Antwerp
-This project predicts **Big Five personality traits (OCEAN)** from English text using a combination of:
-- DistilBERT embeddings
-- LIWC-style psycholinguistic features
-- An ensemble classifier (Random Forest, XGBoost, MLP, SVM)
-The five traits predicted are:
-- **Openness**
-- **Conscientiousness**
-- **Extraversion**
-- **Agreeableness**
-- **Emotional Stability**
-Each trait is classified as:
-- `low`
-- `medium`
-- `high`
----
-## Features
-- Accepts raw free-form text (e.g., job interview answers)
-- Extracts both semantic (BERT) and psycholinguistic (LIWC) features
-- Outputs all 5 personality traits using a custom-trained ensemble
-- Can be used locally or deployed via Gradio (demo available)
----
-## Quick Usage (Python)
-```python
-from personality_model import PersonalityClassifier
-model = PersonalityClassifier()
-text = "I love exploring new cultures and trying unusual foods. I often seek out unfamiliar ideas and perspectives."
-result = model.predict_all_traits(text)
-print(result)
-```
-Expected output:
-```python
-{
-  "Openness": "high",
-  "Conscientiousness": "medium",
-  "Extraversion": "low",
-  "Agreeableness": "high",
-  "Emotional stability": "medium"
-}
-```
----
-## Project Structure
-```
-├── personality_model.py          # PersonalityClassifier pipeline
-├── test_personality_model.py     # CLI tester
-├── feature_extraction/
-│   ├── __init__.py
-│   ├── embedding_from_text.py
-│   ├── liwc_from_text.py
-├── models/
-│   ├── openness_classifier.pkl
-│   ├── conscientiousness_classifier.pkl
-│   ├── extraversion_classifier.pkl
-│   ├── agreeableness_classifier.pkl
-│   ├── emotional_stability_classifier.pkl
-│   ├── feature_scaler.pkl
-│   ├── output.dic
-├── requirements.txt
-├── README.md
-```
----
-## Modeling Details
-- Ensemble of 4 classifiers (VotingClassifier):
-  - `RandomForestClassifier`
-  - `GradientBoostingClassifier`
-  - `MLPClassifier`
-  - `SVC (linear)`
-- Each trait has a separate classifier trained on combined BERT+LIWC features
-- LIWC-style dictionary created from `output.dic`
----
-## Preprocessing & Binning (for original experiments)
-The original project also included regression models and binning rules:
-| Score Range       | Bin Label |
-|-------------------|-----------|
-| 0 ≤ score ≤ 32    | Low       |
-| 33 ≤ score ≤ 66   | Medium    |
-| 67 ≤ score ≤ 100  | High      |
-These were used to convert continuous personality scores into discrete labels.
----
-## Evaluation Scripts
-- Located in `evaluation/` folder (not shown here)
-- Used during development to benchmark model performance
-- Final classifiers are saved in `models/`
----
-## Installation & Environment
-Python: `3.9`
-Recommended: `conda` environment
-```bash
-conda create -n amiv_nlp_2025 python=3.9
-conda activate amiv_nlp_2025
-pip install -r requirements.txt
-```
----
-## License
-For research and non-commercial use. Contact the author for other permissions.
----
-## Authors
-Developed by
-AMIV NLP 2025 — University of Antwerp

README.md ADDED Viewed

	@@ -0,0 +1,159 @@

+---
+license: mit
+tags:
+  - personality-traits
+  - ensemble-model
+  - liwc
+  - big-five
+  - sklearn
+  - distilbert
+  - psychology
+inference: false
+---
+# Personality Trait Predictor (Big Five)
+This repository provides a machine learning pipeline for predicting the **Big Five personality traits** from free-form **text input**. It combines **DistilBERT embeddings**, **LIWC-style linguistic features**, and a set of **Random Forest classifiers** — one for each trait — trained on labeled personality data.
+### Predicted traits:
+- **Openness**
+- **Conscientiousness**
+- **Extraversion**
+- **Agreeableness**
+- **Emotional Stability**
+Each trait is predicted as a **categorical label**: `low`, `medium`, or `high`.
+---
+## How It Works
+- Text is converted to embeddings using the CLS token from `DistilBERT`.
+- LIWC-like features are computed using a custom dictionary (`output.dic`).
+- Both features are concatenated and passed through a **trait-specific Random Forest classifier**.
+- Predictions are returned as string labels for all five traits.
+---
+## Example Usage
+```python
+from personality_model import PersonalityClassifier
+model = PersonalityClassifier()
+text = "I enjoy solving challenging problems and thinking about philosophical questions."
+predictions = model.predict_all_traits(text)
+print(predictions)
+# Output:
+# {
+#   'Openness': 'high',
+#   'Conscientiousness': 'medium',
+#   'Extraversion': 'low',
+#   'Agreeableness': 'medium',
+#   'Emotional stability': 'low'
+# }
+```
+---
+## Installation
+Clone the repository and install dependencies:
+```bash
+git clone https://huggingface.co/Arash-Alborz/personality-trait-predictor
+cd personality-trait-predictor
+# Create a conda environment
+conda create -n personality_env python=3.9
+conda activate personality_env
+# Install dependencies
+pip install -r requirements.txt
+```
+---
+## Project Structure
+```
+personality-trait-predictor/
+├── personality_model.py              # Main class for prediction
+├── requirements.txt                  # Dependencies
+├── README.md                         # Project description
+├── .gitattributes                    # Git LFS tracking
+├── models/
+│   ├── feature_scaler.pkl            # StandardScaler for feature scaling
+│   ├── output.dic                    # LIWC-style dictionary
+│   ├── openness_classifier.pkl       # Classifier for Openness
+│   ├── conscientiousness_classifier.pkl
+│   ├── extraversion_classifier.pkl
+│   ├── agreeableness_classifier.pkl
+│   ├── emotional_stability_classifier.pkl
+├── feature_extraction/
+│   ├── __init__.py
+│   ├── embedding_from_text.py        # BERT embedding code
+│   ├── liwc_from_text.py             # LIWC feature extraction
+│   ├── pipeline.py                   # Combined feature pipeline
+```
+---
+## Model Details
+- **Embeddings**: `DistilBERT` (CLS token from `distilbert-base-cased-distilled-squad`)
+- **Linguistic Features**: Word count vectors from a custom LIWC dictionary
+- **Classifier**: One `RandomForestClassifier` per trait, tuned with custom hyperparameters
+- **Scaling**: Features are scaled using `StandardScaler`
+- **Labels**: Traits are categorized into `low`, `medium`, or `high`
+---
+## Training & Evaluation
+- Each trait classifier was trained on a labeled dataset using combined BERT+LIWC features.
+- Validation was performed on a separate set simulating job interview answers.
+- Random Forest hyperparameters (e.g., `n_estimators`, `max_depth`) were manually optimized per trait for best F1-score.
+---
+## Notes
+- The model does **not** use Hugging Face’s `pipeline()` interface because it integrates custom feature engineering steps.
+- You can import `PersonalityClassifier` directly to use the model.
+---
+## Requirements
+Install with:
+```bash
+pip install -r requirements.txt
+```
+Dependencies include:
+- numpy
+- pandas
+- scikit-learn
+- torch
+- transformers
+- joblib
+- tqdm
+- gradio (optional for UI testing)
+---
+## Author
+University of Antwerp – AMIV NLP 2025
+Project developed as part of NLP Course.
+---
+## License
+This project is licensed under the MIT License.