🔐 Encrypted Text Classifier – 20 Newsgroups Cipher Challenge
This project is built for the Kaggle Ciphertext Challenge, where the goal is to classify encrypted text documents into 20 different newsgroup categories.
🎯 Even without decrypting the text, we trained a character-level machine learning model that achieves over 63% accuracy.
📂 Project Structure
cipher-classifier/ ├── app.py # Streamlit app ├── cipher_classifier.pkl # Pickled model + vectorizer ├── train.csv # Kaggle training data ├── requirements.txt # Libraries for deployment └── README.md
🧠 Model Overview
- Input: Ciphertext strings (unreadable encrypted text)
- Vectorization:
CountVectorizer
with char-level n-grams (1 to 3) - Model: Logistic Regression (sklearn)
- Accuracy: ~63% (without decryption)
Example Output Input (Ciphertext) Predicted Label ['W')(7x1zay7Hb3... 15 Tx4a8M\HNsyp;HM... 8
📦 Deployment This app is designed to run on:
🟢 Hugging Face Spaces
🟢 Streamlit Cloud
🔵 GitHub
📌 Kaggle Link You can download the dataset from the official competition: 👉 Kaggle – 20 Newsgroups Ciphertext Challenge