commited on
Commit
3c55398
·
verified ·
1 Parent(s): 754ae7c

Upload 6 files

Browse files
.gitattributes ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ ridge_model.pkl filter=lfs diff=lfs merge=lfs -text
2
+ tfidf_vectorizer.pkl filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: CommonLit Summary Scorer
3
+ emoji: 📝
4
+ colorFrom: indigo
5
+ colorTo: green
6
+ sdk: streamlit
7
+ app_file: app.py
8
+ pinned: false
9
+ ---
10
+
11
+
12
+ # ✨ Student Summary Auto-Scorer
13
+
14
+ This app uses a Ridge Regression model trained on the [CommonLit Evaluate Student Summaries](https://www.kaggle.com/competitions/commonlit-evaluate-student-summaries) dataset to automatically score student-written summaries.
15
+
16
+ **Predicted Scores:**
17
+ - `Content`
18
+ - `Wording`
19
+
20
+ 🧠 Built with scikit-learn, TF-IDF, and Streamlit
21
+ 🚀 Deployed using Hugging Face Spaces
22
+ 🎓 Educational project
app.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import joblib
3
+
4
+ # Başlık
5
+ st.title("📝 Student Summary Scorer")
6
+ st.markdown("Yazdığınız özeti girin, içeriği ve anlatımı otomatik puanlayalım!")
7
+
8
+ # Kullanıcıdan metin al
9
+ text_input = st.text_area("✍️ Özetinizi buraya yazın", height=250)
10
+
11
+ # Model ve TF-IDF yükle
12
+ model = joblib.load("ridge_model.pkl")
13
+ tfidf = joblib.load("tfidf_vectorizer.pkl")
14
+
15
+ # Tahmin butonu
16
+ if st.button("📊 Puanla"):
17
+ if text_input.strip() == "":
18
+ st.warning("Lütfen bir özet metni girin.")
19
+ else:
20
+ # Vektörleştir ve tahmin et
21
+ X = tfidf.transform([text_input])
22
+ preds = model.predict(X)[0]
23
+
24
+ st.success("✅ Tahminler tamamlandı:")
25
+ st.write(f"**İçerik (Content)**: {round(preds[0], 2)} / 5")
26
+ st.write(f"**Anlatım (Wording)**: {round(preds[1], 2)} / 5")
project_description.txt ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ PROJECT TITLE: CommonLit Student Summary Scorer (Kaggle NLP Project)
2
+
3
+ OBJECTIVE:
4
+ This project is an NLP-based scoring system that automatically evaluates the quality of student-written summaries. It is based on the Kaggle competition "CommonLit Evaluate Student Summaries."
5
+
6
+ Kaggle Link: https://www.kaggle.com/competitions/commonlit-evaluate-student-summaries
7
+
8
+ ---
9
+
10
+ DATA USED:
11
+ - summaries_train.csv → Training data (student summaries + scores)
12
+ - summaries_test.csv → Summaries to be scored
13
+ - sample_submission.csv → Sample submission format
14
+ - prompts_train.csv / prompts_test.csv → Additional metadata (not used in this project)
15
+
16
+ ---
17
+
18
+ TARGET VARIABLES:
19
+ - content → Measures how well the summary captures the main idea (0–5 scale)
20
+ - wording → Measures clarity and expression quality (0–5 scale)
21
+
22
+ ---
23
+
24
+ STEPS IMPLEMENTED:
25
+
26
+ 1. DATA EXPLORATION
27
+ - Loaded and analyzed `summaries_train.csv`
28
+ - Focused on `text`, `content`, and `wording` columns
29
+
30
+ 2. TEXT PROCESSING & MODELING
31
+ - Used `TfidfVectorizer` to convert text into numerical features
32
+ - Applied Ridge Regression inside a `MultiOutputRegressor`
33
+ - Model trained to predict both scores simultaneously
34
+ - Validation RMSE: **0.6819**
35
+
36
+ 3. PREDICTION & KAGGLE SUBMISSION
37
+ - Generated predictions on `summaries_test.csv`
38
+ - Filled predictions into the `sample_submission.csv` structure
39
+ - Created `submission.csv` for competition upload
40
+
41
+ 4. STREAMLIT WEB APP (`app.py`)
42
+ - Developed a user-friendly web interface to input any summary
43
+ - Displays instant predictions for `content` and `wording` scores
44
+ - Exported model and vectorizer as `.pkl` files using `joblib`
45
+ - Deployed publicly using Hugging Face Spaces
46
+
47
+ ---
48
+
49
+ TEST EXAMPLES:
50
+
51
+ [Weak Summary]
52
+ It was about a story. It was good. The people were talking and then something happened.
53
+ Expected Score: Content ≈ 1.0, Wording ≈ 1.0
54
+
55
+ [Intermediate Summary]
56
+ The article discusses the importance of environmental protection. It explains how pollution harms the earth and suggests ways to stop it.
57
+ Expected Score: Content ≈ 3.0, Wording ≈ 3.0
58
+
59
+ [Advanced Summary]
60
+ The summary articulates the author’s argument that environmental degradation is a result of unchecked industrial expansion. It effectively highlights key solutions such as policy reform, corporate accountability, and individual action to mitigate ecological damage.
61
+ Expected Score: Content ≈ 4.5, Wording ≈ 4.5
62
+
63
+ ---
64
+
65
+ LIBRARIES USED:
66
+ - pandas
67
+ - numpy
68
+ - scikit-learn
69
+ - joblib
70
+ - streamlit
71
+
72
+ ---
73
+
74
+ HOW TO RUN:
75
+ 1. Use `streamlit run app.py` for local testing
76
+ 2. Alternatively, access the deployed version via Hugging Face Spaces
77
+ 3. Model files: `ridge_model.pkl`, `tfidf_vectorizer.pkl`
78
+
79
+ ---
80
+
81
+ CREATED BY: [Hande Çarkcı]
82
+ DATE: [June 1, 2025]
83
+ PROJECT #: 2 of 20
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ streamlit
2
+ scikit-learn
3
+ pandas
4
+ joblib
5
+ numpy
ridge_model.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7847461d512d3a8c63cc88709507420f6cc6648046249b57a53fd50286d2774c
3
+ size 160856
tfidf_vectorizer.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8cddd5f8a72d4fd3cab8a19bd7ff58a02bb1f7236d99c1417e156b9fcc9197bf
3
+ size 373282