Spaces:

ihaveaplan66
/

news-analyzer

Runtime error

App Files Files Community

ihaveaplan66 commited on 10 days ago

Commit

3eed4de

verified ·

1 Parent(s): 4a47976

Upload 10 files

Browse files

Files changed (10) hide show

.gitignore +3 -0
LICENSE +21 -0
README.md +68 -11
app.py +122 -0
main.py +95 -0
requirements.txt +8 -0
static/script.js +32 -0
static/style.css +117 -0
templates/about.html +33 -0
templates/index.html +57 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,3 @@

+__pycache__/
+.venv/
+.idea/

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 ihaveaplan66
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,11 +1,68 @@
----
-title: News Analyzer
-emoji: 🔥
-colorFrom: purple
-colorTo: purple
-sdk: docker
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# NewsAnalyzer
+[Live Demo](https://huggingface.co/spaces/YOUR_USERNAME/NewsAnalyzer)
+NewsAnalyzer is a web application that analyzes news articles using natural language processing (NLP) techniques. It helps users quickly understand the sentiment, key topics, and overall trends in the latest news based on any search query.
+## Features
+- Search for recent news articles using NewsAPI
+- Perform sentiment analysis to determine whether each article is positive or negative
+- Automatically categorize articles into topics such as business, technology, politics, and more
+- Generate concise summaries for each article
+- Extract the most frequently mentioned words and display them in a word cloud
+- Visualize sentiment distribution and trending words in clear charts
+- Cache recent searches for improved performance
+## Technologies
+- **Flask** for backend logic and routing
+- **Hugging Face Transformers** for NLP tasks:
+    - Sentiment analysis: [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
+    - Topic classification: [cardiffnlp/tweet-topic-21-multi](https://huggingface.co/cardiffnlp/tweet-topic-21-multi)
+    - Summarization: [facebook/bart-large-cnn](https://huggingface.co/facebook/bart-large-cnn)
+- **NLTK** for tokenization and text preprocessing
+- **Matplotlib** and **WordCloud** for visualizations
+- **NewsAPI** for retrieving real-time news articles
+## Installation
+1. Clone the repository:
+    ```bash
+    git clone https://github.com/ihaveaplan66/news-analyzer.git
+    cd NewsAnalyzer
+    ```
+2. Create a virtual environment and activate it:
+    ```bash
+    python -m venv venv
+    source venv/bin/activate   # Windows: venv\Scripts\activate
+    ```
+3. Install dependencies:
+    ```bash
+    pip install -r requirements.txt
+    ```
+4. Set your NewsAPI key in `app.py`:
+    ```python
+    api_key = "your_newsapi_key_here"
+    ```
+5. Run the application:
+    ```bash
+    python app.py
+    ```
+6. Open the application in your browser:
+    ```
+    http://localhost:5000
+    ```
+## License
+This project is part of my personal portfolio and is provided under the MIT License.
+## Author
+Created by Volodymyr Shereperov

app.py ADDED Viewed

	@@ -0,0 +1,122 @@

+from flask import Flask, render_template, request, Response, url_for
+from main import analyze_news, extract_trending_words
+import io
+import time
+import matplotlib.pyplot as plt
+from wordcloud import WordCloud
+app = Flask(__name__)
+cache = {}
+plt.switch_backend("Agg")
+def get_cached_data(cache_key):
+    cached = cache.get(cache_key)
+    if cached and time.time() - cached["timestamp"] < 600:
+        print(f"Cache hit for {cache_key}")
+        return cached["results"], cached["trending_words"]
+    return None, None
+@app.route("/", methods=["GET", "POST"])
+def index():
+    results, trending_words = [], []
+    sentiment_chart = None
+    query = ""
+    if request.method == "POST":
+        query = request.form["query"]
+        num_articles = int(request.form["num_articles"])
+        api_key = "f80d8a6206cd472baeb21f04786b2626"
+        cache_key = f"{query}_{num_articles}"
+        results, trending_words = get_cached_data(cache_key)
+        if results is None:
+            print(f"No cache for {cache_key}, fetching new data...")
+            results = analyze_news(query, api_key, num_articles)
+            texts = [article["title"] + " " + article.get("summary", "") for article in results]
+            trending_words = extract_trending_words(texts)
+            cache[cache_key] = {
+                "results": results,
+                "trending_words": trending_words,
+                "timestamp": time.time()
+            }
+        if results:
+            sentiment_chart = url_for('sentiment_chart_route')
+    return render_template("index.html", results=results, sentiment_chart=sentiment_chart, query=query, trending_words=trending_words)
+@app.route("/sentiment_chart")
+def sentiment_chart_route():
+    if not cache:
+        return "No sentiment data", 404
+    last_query = list(cache.keys())[-1]
+    cached = cache.get(last_query)
+    if not cached:
+        return "No sentiment data", 404
+    results = cached["results"]
+    sentiments = [article["sentiment"] for article in results]
+    sentiment_counts = dict((x, sentiments.count(x)) for x in set(sentiments))
+    labels = list(sentiment_counts.keys())
+    values = list(sentiment_counts.values())
+    color_map = {
+        "POSITIVE": "#28a745",
+        "NEGATIVE": "#c82333"
+    }
+    colors = [color_map[label] for label in labels]
+    total = sum(values)
+    plt.figure(figsize=(3, 3))
+    plt.pie(values, autopct=lambda pct: f'{int(pct * total / 100)} ({pct:.1f}%)', startangle=140, colors=colors)
+    plt.axis('equal')
+    img = io.BytesIO()
+    plt.savefig(img, format="png", bbox_inches="tight", transparent = True)
+    plt.close()
+    img.seek(0)
+    return Response(img.getvalue(), mimetype="image/png")
+@app.route("/wordcloud_chart")
+def wordcloud_chart():
+    if not cache:
+        return "No word data", 404
+    last_query = list(cache.keys())[-1]
+    cached = cache.get(last_query)
+    if not cached:
+        return "No word data", 404
+    results = cached["results"]
+    texts = [article["title"] + " " + article.get("summary", "") for article in results]
+    text = " ".join(texts)
+    plt.figure(figsize=(16, 8), dpi=150)
+    wordcloud = WordCloud(width=1600, height=800, colormap="Blues", background_color='#222').generate(text)
+    plt.imshow(wordcloud, interpolation="bilinear")
+    plt.axis("off")
+    img = io.BytesIO()
+    plt.savefig(img, format="png", bbox_inches="tight", pad_inches=0, dpi=150)
+    plt.close()
+    img.seek(0)
+    return Response(img.getvalue(), mimetype="image/png")
+@app.route("/about")
+def about():
+    return render_template("about.html")
+if __name__ == "__main__":
+    app.run(host="0.0.0.0", port=7860, debug=True)

main.py ADDED Viewed

	@@ -0,0 +1,95 @@

+import requests
+from collections import Counter
+from transformers import pipeline
+import nltk
+from nltk.tokenize import word_tokenize
+from nltk.corpus import stopwords
+import string
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+import torch
+nltk.download('punkt')
+nltk.download('stopwords')
+nltk.download('averaged_perceptron_tagger')
+nltk.download('punkt_tab')
+# 1. Function for getting news via NewsAPI
+def get_news(query, api_key, num_articles=5):
+    url = f'https://newsapi.org/v2/everything?q={query}&apiKey={api_key}&language=en&pageSize={num_articles}'
+    response = requests.get(url)
+    if response.status_code == 200:
+        return response.json()['articles']
+    return []
+# 2. Analyzing tone with Hugging Face
+tone_analyzer = pipeline("sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english", revision="714eb0f")
+def analyze_sentiment(text):
+    return tone_analyzer(text)[0]
+# 3. Define category
+category_model = AutoModelForSequenceClassification.from_pretrained("cardiffnlp/tweet-topic-21-multi")
+category_tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/tweet-topic-21-multi")
+labels = ['art', 'business', 'entertainment', 'environment', 'fashion', 'finance', 'food',
+          'health', 'law', 'media', 'military', 'music', 'politics', 'religion', 'sci/tech',
+          'sports', 'travel', 'weather', 'world news', 'none']
+def classify_category(text):
+    inputs = category_tokenizer(text, return_tensors="pt", truncation=True, padding=True)
+    outputs = category_model(**inputs)
+    predicted_class = torch.argmax(outputs.logits, dim=1).item()
+    return labels[predicted_class]
+# 4. Summarization
+summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
+def split_text(text, max_tokens=512):
+    words = text.split()
+    return [' '.join(words[i:i+max_tokens]) for i in range(0, len(words), max_tokens)]
+def summarize_text(text):
+    chunks = split_text(text)
+    summaries = [summarizer(chunk, max_length=100, min_length=30, do_sample=False)[0]['summary_text'] for chunk in chunks]
+    return ' '.join(summaries)
+# 5. Search for trending words
+def extract_trending_words(texts):
+    text = ' '.join(texts).lower()
+    words = word_tokenize(text)
+    words = [word for word in words if word not in stopwords.words('english') and word not in string.punctuation and len(word) > 1]
+    word_freq = Counter(words)
+    return word_freq.most_common(10)
+# 6. The main process of analyzing news
+def analyze_news(query, api_key, num_articles=5):
+    articles = get_news(query, api_key, num_articles)
+    if not articles:
+        return []
+    news_results = []
+    for article in articles:
+        title = article.get('title', 'No Title')
+        description = article.get('description', '') or ''
+        url = article.get('url', '#')
+        sentiment = analyze_sentiment(title + " " + description)['label']
+        category = classify_category(title + " " + description)
+        summary = summarize_text(title + " " + description)
+        news_results.append({
+            "title": title,
+            "url": url,
+            "sentiment": sentiment,
+            "category": category,
+            "summary": summary
+        })
+    return news_results

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+Flask==3.0.0
+transformers==4.39.1
+torch==2.2.1
+nltk==3.8.1
+matplotlib==3.8.2
+wordcloud==1.9.3
+requests==2.31.0
+pillow==10.2.0

static/script.js ADDED Viewed

	@@ -0,0 +1,32 @@

+let chartsLoaded = { sentiment: false, wordcloud: false };
+let chartInterval = setInterval(checkCharts, 2000);
+function checkCharts() {
+    fetch('/sentiment_chart')
+        .then(response => {
+            if (response.ok) {
+                document.getElementById('sentimentChart').src = '/sentiment_chart';
+                document.getElementById('sentimentChart').style.display = 'block';
+                document.getElementById('sentimentChartMessage').style.display = 'none';
+                chartsLoaded.sentiment = true;
+                stopIfChartsLoaded();
+            }
+        });
+    fetch('/wordcloud_chart')
+        .then(response => {
+            if (response.ok) {
+                document.getElementById('wordcloudChart').src = '/wordcloud_chart';
+                document.getElementById('wordcloudChart').style.display = 'block';
+                document.getElementById('wordcloudChartMessage').style.display = 'none';
+                chartsLoaded.wordcloud = true;
+                stopIfChartsLoaded();
+            }
+        });
+}
+function stopIfChartsLoaded() {
+    if (chartsLoaded.sentiment && chartsLoaded.wordcloud) {
+        clearInterval(chartInterval);
+    }
+}

static/style.css ADDED Viewed

	@@ -0,0 +1,117 @@

+body {
+    font-family: Arial, sans-serif;
+    background-color: #181818;
+    color: #e0e0e0;
+    display: flex;
+    justify-content: center;
+    align-items: center;
+    flex-direction: column;
+    min-height: 100vh;
+    height: 100%;
+    margin: 0;
+    padding: 0;
+}
+.container {
+    background: #222;
+    padding: 20px;
+    border-radius: 10px;
+    box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.5);
+    text-align: center;
+    width: 90%;
+    max-width: 600px;
+    margin-bottom: 20px;
+}
+.first-container {
+    margin-top: 20px;
+}
+.res-container {
+    text-align: left;
+}
+h1 {
+    color: #f0f0f0;
+}
+h3 {
+    margin-top: 40px;
+}
+p.meta-info {
+    color: #bbb;
+    font-size: 14px;
+    line-height: 0
+}
+textarea, input {
+    width: 100%;
+    padding: 10px;
+    border: 1px solid #444;
+    border-radius: 5px;
+    background: #333;
+    color: #e0e0e0;
+    resize: none;
+    font-size: 16px;
+    outline: none;
+    box-sizing: border-box;
+    margin-bottom: 10px;
+    font-family: inherit;
+}
+button {
+    padding: 10px 15px;
+    border: none;
+    background: #007bff;
+    color: white;
+    font-size: 16px;
+    cursor: pointer;
+    border-radius: 5px;
+    transition: 0.3s;
+    width: 100%;
+}
+button:hover {
+    background: #0056b3;
+}
+input[type=number]::-webkit-inner-spin-button,
+input[type=number]::-webkit-outer-spin-button {
+  -webkit-appearance: none;
+  margin: 0;
+}
+a {
+    color: inherit;
+    text-decoration: none;
+}
+a:hover {
+    text-decoration: underline;
+}
+img {
+    display: block;
+    margin: 0 auto;
+}
+.about-container {
+    margin-top: 20px;
+}
+.about-container li {
+    margin-bottom: 15px;
+    text-align: left;
+}
+.about-container p {
+    text-align: left;
+}
+footer {
+    text-align: center;
+    margin-bottom: 20px;
+    color: #bbb;
+    font-size: 14px;
+}

templates/about.html ADDED Viewed

	@@ -0,0 +1,33 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>About - NewsAnalyzer</title>
+    <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
+    <link rel="icon" href="data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'%3E%3Ctext y='80' font-size='80' %3E📰%3C/text%3E%3C/svg%3E">
+</head>
+<body>
+    <div class="container about-container">
+        <h1>About NewsAnalyzer</h1>
+        <p>
+            NewsAnalyzer is an AI-powered tool designed to help you quickly understand the latest news trends. Simply enter your search query, and NewsAnalyzer will gather and analyze articles to deliver valuable insights.
+        </p>
+         <p>Here's how it works:</p>
+        <ul>
+            <li><b>News Collection:</b> News articles are fetched from NewsAPI based on your search query.</li>
+            <li><b>Sentiment Analysis:</b> Each article is evaluated to determine whether its tone is positive or negative.</li>
+            <li><b>Category Classification:</b> Articles are automatically categorized into topics like business, technology, politics, and more.</li>
+            <li><b>Summarization:</b> Long descriptions are summarized to give you only the most important details.</li>
+            <li><b>Trending Words:</b> The most frequently mentioned words across articles are extracted to highlight key topics.</li>
+            <li><b>Charts:</b> Sentiment distribution and trending words are visualized with clear, easy-to-understand charts.</li>
+        </ul>
+        <p>Technologies:</p>
+        <p>NewsAnalyzer combines real-time data collection with machine learning models from Hugging Face, using Flask to seamlessly connect the data processing, analysis, and visualization into a single streamlined experience.</p>
+        <a href="/" style="color: #bbb;">← Back to Home</a>
+    </div>
+</body>
+<footer>
+    Created by Volodymyr Shereperov | <a href="https://github.com/ihaveaplan66/news-analyzer" target="_blank">Source Code on GitHub</a>
+</footer>
+</html>

templates/index.html ADDED Viewed

	@@ -0,0 +1,57 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>NewsAnalyzer</title>
+    <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">
+    <link rel="icon" href="data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'%3E%3Ctext y='80' font-size='80' %3E📰%3C/text%3E%3C/svg%3E">
+</head>
+<body>
+    <div class="container first-container">
+        <h1>NewsAnalyzer</h1>
+        <form method="POST">
+            <textarea name="query" placeholder="Enter search query (e.g., Tesla)" required></textarea>
+            <input type="number" name="num_articles" min="1" max="20" value="5" required placeholder="Number of articles (1-20)">
+            <button type="submit">Analyze</button>
+        </form>
+        <p style="margin-top: 10px; font-size: 14px; color: #aaa;">
+            <a href="/about" style="color: #bbb;">How does it work?</a>
+        </p>
+    </div>
+    {% if results %}
+        <div class="container res-container">
+            <h2 style="text-align: center">Analysis for '{{ query }}'</h2>
+            {% for article in results %}
+                <h3><b><a href="{{ article.url }}" target="_blank">{{ article.title }}</a></b></h3>
+                <p class="meta-info">{{ article.category }} · {{ article.sentiment }}</p>
+                <p>{{ article.summary }}</p>
+            {% endfor %}
+        </div>
+        <div class="container">
+            <h2>Sentiment Analysis</h2>
+            <img id="sentimentChart" src="" alt="Sentiment Analysis" style="display:none; max-width:100%;">
+            <p id="sentimentChartMessage" style="text-align: center; color: #aaa;">Charts are loading...</p>
+        </div>
+        <div class="container">
+            <h2>Trending Words</h2>
+            <ul style="list-style: none; padding: 0; text-align: center;">
+                {% for word, count in trending_words %}
+                    <li style="display: inline-block; margin: 5px; color: #bbb;">{{ word }} ({{ count }})</li>
+                {% endfor %}
+            </ul>
+            <img id="wordcloudChart" src="" alt="Word Cloud" style="display:none; max-width:100%;">
+            <p id="wordcloudChartMessage" style="text-align: center; color: #aaa;">Charts are loading...</p>
+        </div>
+    <script src="{{ url_for('static', filename='script.js') }}"></script>
+    {% endif %}
+</body>
+<footer>
+    Created by Volodymyr Shereperov | <a href="https://github.com/ihaveaplan66/news-analyzer" target="_blank">Source Code on GitHub</a>
+</footer>
+</html>