A newer version of the Streamlit SDK is available:
1.49.1
title: 🍇 Blueberry Yield Regression
emoji: 🌾
colorFrom: indigo
colorTo: green
sdk: streamlit
app_file: app.py
pinned: true
license: mit
tags:
- regression
- machine-learning
- streamlit
- kaggle
- agriculture
🍇 Blueberry Yield Prediction with Machine Learning
This project is a complete machine learning pipeline that predicts the yield of wild blueberries using various environmental and biological features such as pollinator counts, rainfall, and fruit measurements.
📌 Project Type
- Supervised Learning
- Regression Problem
🔍 Problem Description
Predicting agricultural yield is a crucial component in planning, sustainability, and food economics. The dataset used in this project comes from the Kaggle Playground Series S3E14 competition and contains information on:
- Different species of pollinators (honeybee, bumblebee, osmia...)
- Environmental conditions (rainfall days, temperature ranges...)
- Fruit attributes (fruit mass, fruit set, seed count...)
🎯 Goal: Predict the yield
(kg/ha) of blueberries based on input features.
📊 Dataset Info
train.csv
: 15,289 samples with 18 featurestest.csv
: same structure, no target- No missing values, clean numerical data
📈 What We Did (Pipeline Summary)
EDA (Exploratory Data Analysis)
- Checked for missing values ✅
- Analyzed feature distributions & target (
yield
) - Built correlation heatmaps — strongest positive correlations:
fruitmass
,fruitset
,seeds
Data Preprocessing
- Removed
id
column - Standard feature selection based on correlation
- No categorical encoding needed (all numerical)
- Removed
Model Training
- Model:
RandomForestRegressor
- Train-Test Split: 80/20
- Results:
- RMSE ≈ 573.8
- R² Score ≈ 0.81 ✅
- Model:
Test Prediction & Submission
- Predictions made on
test.csv
submission.csv
generated for Kaggle submission
- Predictions made on
Streamlit App
- Users input bee counts, rain days, and fruit measurements
- Predicts blueberry yield in kg/ha
- Uses trained model (
rf_model.pkl
) behind the scenes
🚀 Try it Online
🌐 You can try this app live here:
Hugging Face Space Link
🔮 What Could Be Improved?
Area | Suggestion |
---|---|
Feature Engineering | Create interaction terms, try log/ratio features |
Model | Try LightGBM, XGBoost, or stacking |
Tuning | GridSearchCV or Optuna for hyperparameter optimization |
Visualization | Add interactive charts in Streamlit app |
Real-World Data | Add satellite weather data, soil types, historical trends |
📁 Project Structure
📦 blueberry-yield-regression ├── app.py ├── rf_model.pkl ├── model_columns.pkl ├── requirements.txt ├── submission.csv └── README.md
📜 License
MIT License – Free to use, modify and distribute.