π§ Spam Classifier (Scikit-learn + spaCy)
This model classifies messages as spam or ham using traditional NLP techniques.
π§ Model Details
- Preprocessing: Tokenization + Lemmatization using spaCy
- Vectorization: TF-IDF (1-2 grams)
- Feature Selection: Chi2 with top 1000 features
- Model: Logistic Regression (
class_weight="balanced"
,max_iter=1000
) - Performance: ~87% accuracy on balanced test set (800 spam, 800 ham)
π¦ Files
spam_classifier_bundle.joblib
: Includes trained model, TF-IDF vectorizer, label encoder, and feature selector
π₯ Load Model (Example)
from huggingface_hub import hf_hub_download
import joblib
bundle = joblib.load(hf_hub_download("mageshcruz/spam-classifier-scikit", "spam_classifier.joblib"))
model = bundle["model"]
vector = bundle["vectorizer"]
selector = bundle["selector"]
le = bundle["label_encoder"]