mageshcruz's picture
Upload README.md with huggingface_hub
638ed20 verified
metadata
license: mit
tags:
  - spam
  - text-classification
  - scikit-learn
  - tfidf
  - spaCy
  - logistic-regression
language: en
datasets: custom
model-index:
  - name: Spam Classifier (Scikit-learn + spaCy)
    results: []

πŸ“§ Spam Classifier (Scikit-learn + spaCy)

This model classifies messages as spam or ham using traditional NLP techniques.

🧠 Model Details

  • Preprocessing: Tokenization + Lemmatization using spaCy
  • Vectorization: TF-IDF (1-2 grams)
  • Feature Selection: Chi2 with top 1000 features
  • Model: Logistic Regression (class_weight="balanced", max_iter=1000)
  • Performance: ~87% accuracy on balanced test set (800 spam, 800 ham)

πŸ“¦ Files

  • spam_classifier_bundle.joblib: Includes trained model, TF-IDF vectorizer, label encoder, and feature selector

πŸ“₯ Load Model (Example)

from huggingface_hub import hf_hub_download
import joblib

bundle = joblib.load(hf_hub_download("mageshcruz/spam-classifier-scikit", "spam_classifier.joblib"))
model = bundle["model"]
vector = bundle["vectorizer"]
selector = bundle["selector"]
le = bundle["label_encoder"]