Email Classifier
This project implements an email classification model that assigns each email to a specific category using SBERT all-minilm-l6-v2 for text embeddings, followed by a sequential neural network for final classification.
Model Description
- Architecture:
SBERT (384‑d) → Dense(256, ReLU) → Dropout(0.4) → Dense(128, ReLU) → Dropout(0.4) → Softmax(5) - Frameworks: TensorFlow2.17, sentence‑transformer
Training Data & Preprocessing
- Emails: 4954 college emails, manually labeled into
[Academics, Clubs, Internships, Others, Talks] - Split: 80% train / 20% test
- Embedding & Labeling:
- Each email was embedded with
all‑MiniLM‑L6‑v2(SBERT). - We created a small “prototype” set of example sentences for each category.
- For every email, we computed cosine similarities between its SBERT embedding and each prototype embedding.
- The email was assigned to the category whose prototype had the highest cosine score (threshold ≥ 0.4).
- Each email was embedded with
Evaluation
The model was tested on 991 college‑email samples. Below are the per‑class precision, recall, F1‑score and support:
| Class | label | Support | Precision | Recall | F1‑Score |
|---|---|---|---|---|---|
| 0 | Academics | 200 | 0.92 | 0.97 | 0.94 |
| 1 | Clubs | 236 | 0.94 | 0.96 | 0.95 |
| 2 | Internships | 143 | 0.95 | 0.98 | 0.97 |
| 3 | Others | 200 | 0.95 | 0.83 | 0.89 |
| 4 | Takls | 212 | 0.93 | 0.94 | 0.93 |
Aggregate metrics
| Metric | Accuracy | Precision | Recall | F1‑Score |
|---|---|---|---|---|
| Overall | 0.94 | — | — | — |
| Macro avg | — | 0.94 | 0.94 | 0.94 |
| Weighted avg | — | 0.94 | 0.94 | 0.93 |
Confusion Matrix
Usage
1. Install dependencies
pip install tensorflow sentence-transformers huggingface_hub
2. Load the model & embedder
from sentence_transformers import SentenceTransformer
import tensorflow as tf
from huggingface_hub import hf_hub_download
# 1) Load SBERT embedder
embedder = SentenceTransformer("all-MiniLM-L6-v2")
# 2) Load your fine‑tuned classifier
model_file = hf_hub_download(
repo_id="skgezhil2005/email_classifier",
filename="model_v2.keras" #replace with your model file
)
model = tf.keras.models.load_model(model_file)
# 3) Define label names (in the same order used during training)
labels = ["Academics", "Clubs", "Internships", "Others", "Talks"]
3. Inference Helper
def classify_email(text: str) -> str:
# Compute a 1×384 SBERT embedding
emb = embedder.encode(text, convert_to_tensor=False)
emb = emb.reshape(1, -1)
# Predict probabilities and pick the highest‐scoring class
prediction = model.predict(emb)
pred_idx = int(np.argmax(prediction[0]))
return labels[pred_idx]
- Downloads last month
- 5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
