Spaces:

handecarkci
/

commonlit-summary-scorer

Sleeping

App Files Files Community

hç commited on Jun 1

Commit

3c55398

verified ·

1 Parent(s): 754ae7c

Upload 6 files

Browse files

Files changed (7) hide show

.gitattributes +2 -0
README.md +22 -0
app.py +26 -0
project_description.txt +83 -0
requirements.txt +5 -0
ridge_model.pkl +3 -0
tfidf_vectorizer.pkl +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ ridge_model.pkl filter=lfs diff=lfs merge=lfs -text
2	+ tfidf_vectorizer.pkl filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,22 @@

+---
+title: CommonLit Summary Scorer
+emoji: 📝
+colorFrom: indigo
+colorTo: green
+sdk: streamlit
+app_file: app.py
+pinned: false
+---
+# ✨ Student Summary Auto-Scorer
+This app uses a Ridge Regression model trained on the [CommonLit Evaluate Student Summaries](https://www.kaggle.com/competitions/commonlit-evaluate-student-summaries) dataset to automatically score student-written summaries.
+**Predicted Scores:**
+- `Content`
+- `Wording`
+🧠 Built with scikit-learn, TF-IDF, and Streamlit
+🚀 Deployed using Hugging Face Spaces
+🎓 Educational project

app.py ADDED Viewed

	@@ -0,0 +1,26 @@

+import streamlit as st
+import joblib
+# Başlık
+st.title("📝 Student Summary Scorer")
+st.markdown("Yazdığınız özeti girin, içeriği ve anlatımı otomatik puanlayalım!")
+# Kullanıcıdan metin al
+text_input = st.text_area("✍️ Özetinizi buraya yazın", height=250)
+# Model ve TF-IDF yükle
+model = joblib.load("ridge_model.pkl")
+tfidf = joblib.load("tfidf_vectorizer.pkl")
+# Tahmin butonu
+if st.button("📊 Puanla"):
+    if text_input.strip() == "":
+        st.warning("Lütfen bir özet metni girin.")
+    else:
+        # Vektörleştir ve tahmin et
+        X = tfidf.transform([text_input])
+        preds = model.predict(X)[0]
+        st.success("✅ Tahminler tamamlandı:")
+        st.write(f"**İçerik (Content)**: {round(preds[0], 2)} / 5")
+        st.write(f"**Anlatım (Wording)**: {round(preds[1], 2)} / 5")

project_description.txt ADDED Viewed

	@@ -0,0 +1,83 @@

+PROJECT TITLE: CommonLit Student Summary Scorer (Kaggle NLP Project)
+OBJECTIVE:
+This project is an NLP-based scoring system that automatically evaluates the quality of student-written summaries. It is based on the Kaggle competition "CommonLit Evaluate Student Summaries."
+Kaggle Link: https://www.kaggle.com/competitions/commonlit-evaluate-student-summaries
+---
+DATA USED:
+- summaries_train.csv → Training data (student summaries + scores)
+- summaries_test.csv → Summaries to be scored
+- sample_submission.csv → Sample submission format
+- prompts_train.csv / prompts_test.csv → Additional metadata (not used in this project)
+---
+TARGET VARIABLES:
+- content → Measures how well the summary captures the main idea (0–5 scale)
+- wording → Measures clarity and expression quality (0–5 scale)
+---
+STEPS IMPLEMENTED:
+1. DATA EXPLORATION
+   - Loaded and analyzed `summaries_train.csv`
+   - Focused on `text`, `content`, and `wording` columns
+2. TEXT PROCESSING & MODELING
+   - Used `TfidfVectorizer` to convert text into numerical features
+   - Applied Ridge Regression inside a `MultiOutputRegressor`
+   - Model trained to predict both scores simultaneously
+   - Validation RMSE: **0.6819**
+3. PREDICTION & KAGGLE SUBMISSION
+   - Generated predictions on `summaries_test.csv`
+   - Filled predictions into the `sample_submission.csv` structure
+   - Created `submission.csv` for competition upload
+4. STREAMLIT WEB APP (`app.py`)
+   - Developed a user-friendly web interface to input any summary
+   - Displays instant predictions for `content` and `wording` scores
+   - Exported model and vectorizer as `.pkl` files using `joblib`
+   - Deployed publicly using Hugging Face Spaces
+---
+TEST EXAMPLES:
+[Weak Summary]
+It was about a story. It was good. The people were talking and then something happened.
+Expected Score: Content ≈ 1.0, Wording ≈ 1.0
+[Intermediate Summary]
+The article discusses the importance of environmental protection. It explains how pollution harms the earth and suggests ways to stop it.
+Expected Score: Content ≈ 3.0, Wording ≈ 3.0
+[Advanced Summary]
+The summary articulates the author’s argument that environmental degradation is a result of unchecked industrial expansion. It effectively highlights key solutions such as policy reform, corporate accountability, and individual action to mitigate ecological damage.
+Expected Score: Content ≈ 4.5, Wording ≈ 4.5
+---
+LIBRARIES USED:
+- pandas
+- numpy
+- scikit-learn
+- joblib
+- streamlit
+---
+HOW TO RUN:
+1. Use `streamlit run app.py` for local testing
+2. Alternatively, access the deployed version via Hugging Face Spaces
+3. Model files: `ridge_model.pkl`, `tfidf_vectorizer.pkl`
+---
+CREATED BY: [Hande Çarkcı]
+DATE: [June 1, 2025]
+PROJECT #: 2 of 20

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+streamlit
+scikit-learn
+pandas
+joblib
+numpy

ridge_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7847461d512d3a8c63cc88709507420f6cc6648046249b57a53fd50286d2774c
+size 160856

tfidf_vectorizer.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8cddd5f8a72d4fd3cab8a19bd7ff58a02bb1f7236d99c1417e156b9fcc9197bf
+size 373282