Upload 5 files
b67b96f verified

A newer version of the Streamlit SDK is available: 1.49.1

Upgrade
metadata
title: 🍇 Blueberry Yield Regression
emoji: 🌾
colorFrom: indigo
colorTo: green
sdk: streamlit
app_file: app.py
pinned: true
license: mit
tags:
  - regression
  - machine-learning
  - streamlit
  - kaggle
  - agriculture

🍇 Blueberry Yield Prediction with Machine Learning

This project is a complete machine learning pipeline that predicts the yield of wild blueberries using various environmental and biological features such as pollinator counts, rainfall, and fruit measurements.

📌 Project Type

  • Supervised Learning
  • Regression Problem

🔍 Problem Description

Predicting agricultural yield is a crucial component in planning, sustainability, and food economics. The dataset used in this project comes from the Kaggle Playground Series S3E14 competition and contains information on:

  • Different species of pollinators (honeybee, bumblebee, osmia...)
  • Environmental conditions (rainfall days, temperature ranges...)
  • Fruit attributes (fruit mass, fruit set, seed count...)

🎯 Goal: Predict the yield (kg/ha) of blueberries based on input features.


📊 Dataset Info

  • train.csv: 15,289 samples with 18 features
  • test.csv: same structure, no target
  • No missing values, clean numerical data

📈 What We Did (Pipeline Summary)

  1. EDA (Exploratory Data Analysis)

    • Checked for missing values ✅
    • Analyzed feature distributions & target (yield)
    • Built correlation heatmaps — strongest positive correlations:
      • fruitmass, fruitset, seeds
  2. Data Preprocessing

    • Removed id column
    • Standard feature selection based on correlation
    • No categorical encoding needed (all numerical)
  3. Model Training

    • Model: RandomForestRegressor
    • Train-Test Split: 80/20
    • Results:
      • RMSE ≈ 573.8
      • R² Score ≈ 0.81
  4. Test Prediction & Submission

    • Predictions made on test.csv
    • submission.csv generated for Kaggle submission
  5. Streamlit App

    • Users input bee counts, rain days, and fruit measurements
    • Predicts blueberry yield in kg/ha
    • Uses trained model (rf_model.pkl) behind the scenes

🚀 Try it Online

🌐 You can try this app live here:
Hugging Face Space Link


🔮 What Could Be Improved?

Area Suggestion
Feature Engineering Create interaction terms, try log/ratio features
Model Try LightGBM, XGBoost, or stacking
Tuning GridSearchCV or Optuna for hyperparameter optimization
Visualization Add interactive charts in Streamlit app
Real-World Data Add satellite weather data, soil types, historical trends

📁 Project Structure

📦 blueberry-yield-regression ├── app.py ├── rf_model.pkl ├── model_columns.pkl ├── requirements.txt ├── submission.csv └── README.md


📜 License

MIT License – Free to use, modify and distribute.