title: Fake News Detection MLOs Web App
emoji: π
colorFrom: blue
colorTo: blue
sdk: docker
pinned: false
short_description: An end to end web app that allows to check for fake news
license: mit
π° Fake News Detector
A full-stack AI-powered fake news detection system built using Python, FastAPI, Streamlit, and traditional ML. This project demonstrates end-to-end skills in AI, machine learning, MLOps, and cloud deployment. It includes real-time inference, automatic retraining based on new data, drift detection, live monitoring, and user interactivityβall hosted for free on HuggingFace Spaces.
π Live App: https://huggingface.co/spaces/Ahmedik95316/Fake-News-Detection-MLOs-Web-App
π Overview: What This Project Demonstrates
This system showcases:
- β Real-world problem framing and supervised learning pipeline
- β Logistic Regression + TF-IDF for binary text classification (Fake vs Real news)
- β FastAPI as a backend service for live model inference
- β Streamlit frontend with confidence feedback and visualization
- β Hourly scraping of real articles from news websites
- β Generation of fake news headlines using prompt-style templates
- β Automated retraining when new data is added
- β Promotion strategy for candidate model vs production
- β Jensen-Shannon divergence for detecting data drift
- β JSON-based metadata and model versioning
- β Activity and monitoring logs for training, drift, and promotion
- β Custom CSV upload and live training inside the UI
- β Entirely deployed and hosted on Render.com β no setup required for end users
This project proves your ability to bridge machine learning, DevOps, and user experience designβall critical MLOps competencies.
π§ What the System Does
This project automatically:
- Scrapes real news articles every hour from Reuters, BBC, and NPR
- Generates fake news headlines using programmatic templates
- Appends new data to the dataset
- Triggers retraining if data is added
- Compares model accuracy to existing model
- Promotes candidate model if it performs better
- Logs drift scores and training events
- Allows users to manually upload datasets and monitor training
- Predicts Fake or Real for any given text through the Streamlit interface
π Directory Breakdown
/app/
fastapi_server.pyβ FastAPI backend serving the/predictendpoint. Used by Streamlit to perform live model inference.streamlit_app.pyβ The main UI for users. Handles input, prediction, training visualization, metadata, drift monitoring, and upload support.
/data/
prepare_datasets.pyβ Merges Kaggle and LIAR datasets, standardizes formats, and outputs a unified training file.scrape_real_news.pyβ Scrapes the latest articles from Reuters, BBC, and NPR using newspaper3k.generate_fake_news.pyβ Uses predefined templates to generate believable fake headlines and articles.combined_dataset.csvβ The master dataset combining real and fake news.scraped_real.csvβ Output from the scraper.generated_fake.csvβ Output from the fake news generator.
/model/
train.pyβ Trains the ML model (Logistic Regression + TF-IDF) on the dataset provided.retrain.pyβ Trains a candidate model on newly added data, compares it to the production model, and promotes it if accuracy improves.model.pklβ Current production model.vectorizer.pklβ TF-IDF encoder used with the production model.model_candidate.pklβ Temporarily trained candidate model.vectorizer_candidate.pklβ TF-IDF encoder for candidate model.metadata.jsonβ Tracks model version, training accuracy, and timestamp of last promotion.
/monitor/
monitor_drift.pyβ Calculates Jensen-Shannon divergence between real-time data and training data to identify distributional shift.
/scheduler/
schedule_tasks.pyβ Central scheduler that triggers scraping, fake generation, retraining, drift monitoring, and logging every hour.
/logs/
activity_log.jsonβ Timestamped logs of scraping, generation, and retraining activities.monitoring_log.jsonβ Drift scores and evaluation logs of candidate vs production model.
Root Files
requirements.txtβ All project dependenciesrender.yamlβ Configuration file used to deploy both backend and frontend on Render.comREADME.mdβ The file you're reading
π‘ Automation Logic
Every hour (or every minute in test mode), the system:
- Scrapes 15 new real news articles
- Generates 20 new fake news articles
- Appends them to the existing dataset
- Triggers retraining of a candidate model
- Compares it to the existing production model
- If better, promotes the candidate to production
- Logs drift score using Jensen-Shannon divergence
- Updates visual logs and accuracy charts in the UI
π Live Deployment: Hugging Face Spaces This project is fully deployed on Hugging Face Spaces using a Dockerized setup that includes both the Streamlit UI and FastAPI backend in a single container.
π Launch the App on Hugging Face Spaces The app runs entirely within a Hugging Face-hosted Docker container.
Both the FastAPI inference server and Streamlit web interface are packaged together, ensuring fast internal communication.
The API_URL in streamlit_app.py is set to http://localhost:8000/predict to support intra-container requests.
The container uses Python 3.11.6, aligned with the local development environment for consistency and reproducibility.
βοΈ Deployment Infrastructure The deployment uses a custom Dockerfile tailored to match the exact development environment.
All dependencies are pinned to specific versions in the requirements.txt file to avoid incompatibilities.
The container runs both services concurrently using a process supervisor (if needed), ensuring a single deployment handles the complete user workflow.
π― Skills Demonstrated
- AI/ML: Logistic Regression, TF-IDF, binary text classification
- MLOps: scheduled retraining, model promotion, version tracking
- Drift Detection: Jensen-Shannon divergence implementation
- Cloud DevOps: deploying two services via
render.yaml - UI/UX: live model prediction, upload, progress bars, logging
- Data Engineering: merging datasets, web scraping, labeling
π§ Credits
- LIAR Dataset (Politifact)
- Fake and Real News Dataset (Kaggle)
- newspaper3k
- FastAPI, Streamlit, scikit-learn, Render
π License
MIT