Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -1,2 +1,60 @@ 
     | 
|
| 1 | 
         
            -
             
     | 
| 2 | 
         
            -
             
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
| 
         | 
|
| 1 | 
         
            +
            ---
         
     | 
| 2 | 
         
            +
            language: en
         
     | 
| 3 | 
         
            +
            license: mit
         
     | 
| 4 | 
         
            +
            tags:
         
     | 
| 5 | 
         
            +
              - keras
         
     | 
| 6 | 
         
            +
              - lstm
         
     | 
| 7 | 
         
            +
              - spam-classification
         
     | 
| 8 | 
         
            +
              - text-classification
         
     | 
| 9 | 
         
            +
              - binary-classification
         
     | 
| 10 | 
         
            +
              - email
         
     | 
| 11 | 
         
            +
              - deep-learning
         
     | 
| 12 | 
         
            +
            library_name: keras
         
     | 
| 13 | 
         
            +
            pipeline_tag: text-classification
         
     | 
| 14 | 
         
            +
            model_name: Spam Email Classifier (BiLSTM)
         
     | 
| 15 | 
         
            +
            datasets:
         
     | 
| 16 | 
         
            +
              - SetFit/enron_spam
         
     | 
| 17 | 
         
            +
            ---
         
     | 
| 18 | 
         
            +
             
     | 
| 19 | 
         
            +
            # 📧 Spam Email Classifier using BiLSTM
         
     | 
| 20 | 
         
            +
             
     | 
| 21 | 
         
            +
            This model uses a **Bidirectional LSTM (BiLSTM)** architecture built with **Keras** to classify email messages as **Spam** or **Ham**. It was trained on the [Enron Spam Dataset](https://huggingface.co/datasets/SetFit/enron_spam) using GloVe word embeddings.
         
     | 
| 22 | 
         
            +
             
     | 
| 23 | 
         
            +
            ---
         
     | 
| 24 | 
         
            +
             
     | 
| 25 | 
         
            +
            ## 🧠 Model Architecture
         
     | 
| 26 | 
         
            +
             
     | 
| 27 | 
         
            +
            - **Tokenizer**: Keras `Tokenizer` trained on the Enron dataset  
         
     | 
| 28 | 
         
            +
            - **Embedding**: Pretrained [GloVe.6B.100d](https://nlp.stanford.edu/projects/glove/)
         
     | 
| 29 | 
         
            +
            - **Model**: `Embedding → BiLSTM → Dropout → Dense(sigmoid)`
         
     | 
| 30 | 
         
            +
            - **Input**: English email/message text  
         
     | 
| 31 | 
         
            +
            - **Output**: `0 = Ham`, `1 = Spam`
         
     | 
| 32 | 
         
            +
             
     | 
| 33 | 
         
            +
            ---
         
     | 
| 34 | 
         
            +
             
     | 
| 35 | 
         
            +
            ## 🧪 Example Usage
         
     | 
| 36 | 
         
            +
             
     | 
| 37 | 
         
            +
            ```python
         
     | 
| 38 | 
         
            +
            from tensorflow.keras.models import load_model
         
     | 
| 39 | 
         
            +
            from huggingface_hub import hf_hub_download
         
     | 
| 40 | 
         
            +
            import pickle
         
     | 
| 41 | 
         
            +
            from tensorflow.keras.preprocessing.sequence import pad_sequences
         
     | 
| 42 | 
         
            +
             
     | 
| 43 | 
         
            +
            # Load files from HF Hub
         
     | 
| 44 | 
         
            +
            model_path = hf_hub_download("lokas/spam-emails-classifier", "model.h5")
         
     | 
| 45 | 
         
            +
            tokenizer_path = hf_hub_download("lokas/spam-emails-classifier", "tokenizer.pkl")
         
     | 
| 46 | 
         
            +
             
     | 
| 47 | 
         
            +
            # Load model and tokenizer
         
     | 
| 48 | 
         
            +
            model = load_model(model_path)
         
     | 
| 49 | 
         
            +
            with open(tokenizer_path, "rb") as f:
         
     | 
| 50 | 
         
            +
                tokenizer = pickle.load(f)
         
     | 
| 51 | 
         
            +
             
     | 
| 52 | 
         
            +
            # Prediction function
         
     | 
| 53 | 
         
            +
            def predict_spam(text):
         
     | 
| 54 | 
         
            +
                seq = tokenizer.texts_to_sequences([text])
         
     | 
| 55 | 
         
            +
                padded = pad_sequences(seq, maxlen=50)  # must match training maxlen
         
     | 
| 56 | 
         
            +
                pred = model.predict(padded)[0][0]
         
     | 
| 57 | 
         
            +
                return "🚫 Spam" if pred > 0.5 else "✅ Not Spam"
         
     | 
| 58 | 
         
            +
             
     | 
| 59 | 
         
            +
            # Example
         
     | 
| 60 | 
         
            +
            print(predict_spam("Win a free iPhone now!"))
         
     |