Email Classifier

This project implements an email classification model that assigns each email to a specific category using SBERT all-minilm-l6-v2 for text embeddings, followed by a sequential neural network for final classification.

Model Description

Architecture: SBERT (384‑d) → Dense(256, ReLU) → Dropout(0.4) → Dense(128, ReLU) → Dropout(0.4) → Softmax(5)
Frameworks: TensorFlow2.17, sentence‑transformer

Training Data & Preprocessing

Emails: 4954 college emails, manually labeled into [Academics, Clubs, Internships, Others, Talks]
Split: 80% train / 20% test
Embedding & Labeling:
1. Each email was embedded with all‑MiniLM‑L6‑v2 (SBERT).
2. We created a small “prototype” set of example sentences for each category.
3. For every email, we computed cosine similarities between its SBERT embedding and each prototype embedding.
4. The email was assigned to the category whose prototype had the highest cosine score (threshold ≥ 0.4).

Evaluation

The model was tested on 991 college‑email samples. Below are the per‑class precision, recall, F1‑score and support:

Class	label	Support	Precision	Recall	F1‑Score
0	Academics	200	0.92	0.97	0.94
1	Clubs	236	0.94	0.96	0.95
2	Internships	143	0.95	0.98	0.97
3	Others	200	0.95	0.83	0.89
4	Takls	212	0.93	0.94	0.93

Aggregate metrics

Metric	Accuracy	Precision	Recall	F1‑Score
Overall	0.94	—	—	—
Macro avg	—	0.94	0.94	0.94
Weighted avg	—	0.94	0.94	0.93

Confusion Matrix

Usage

1. Install dependencies

pip install tensorflow sentence-transformers huggingface_hub

2. Load the model & embedder

from sentence_transformers import SentenceTransformer
import tensorflow as tf
from huggingface_hub import hf_hub_download

# 1) Load SBERT embedder
embedder = SentenceTransformer("all-MiniLM-L6-v2")

# 2) Load your fine‑tuned classifier
model_file = hf_hub_download(
    repo_id="skgezhil2005/email_classifier", 
    filename="model_v2.keras" #replace with your model file 
)
model = tf.keras.models.load_model(model_file)

# 3) Define label names (in the same order used during training)
labels = ["Academics", "Clubs", "Internships", "Others", "Talks"]

3. Inference Helper

def classify_email(text: str) -> str:
    # Compute a 1×384 SBERT embedding
    emb = embedder.encode(text, convert_to_tensor=False)
    emb = emb.reshape(1, -1)
    # Predict probabilities and pick the highest‐scoring class
    prediction = model.predict(emb)
    pred_idx = int(np.argmax(prediction[0]))

    return labels[pred_idx]

Downloads last month: 5

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support