File size: 4,799 Bytes
02a96af 5eab9af 02a96af 5eab9af 02a96af |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
---
library_name: sklearn
tags:
- energy-consumption
- regression
- random-forest
- xgboost
- building-energy
- sustainability
- carbon-footprint
pipeline_tag: tabular-regression
---
# Ecologia Electricity Consumption Model
## Model Description
This model predicts **electricity_consumption (kWh)** for buildings using machine learning ensemble methods.
- **Model Architecture**: Random Forest Regressor (Best Model)
- **Task**: Regression (Energy Consumption Prediction)
- **Target Variable**: electricity_consumption (kWh)
- **Input Features**: 22 features
- **Training Dataset**: Building Data Genome Project 2
- **Training Samples**: ~15 million
## Model Performance
### Random Forest Model
- **RMSE**: 37.6519
- **MAE**: 17.5059
- **R² Score**: 0.9587
### XGBoost Model
- **RMSE**: 59.3440
- **MAE**: 29.7273
- **R² Score**: 0.8973
### Best Model
The best performing model (based on validation RMSE) is saved as `electricity_model.joblib`.
## Training Details
### Dataset
- **Source**: [Building Data Genome Project 2](https://www.kaggle.com/datasets/claytonmiller/buildingdatagenomeproject2)
- **Training Samples**: ~15 million
- **Data Preprocessing**:
- Outlier removal (99th percentile)
- Feature engineering (temporal, building, weather features)
- Missing value imputation
- Normalization
### Training Method
- **Algorithm**: Ensemble (Random Forest + XGBoost)
- **Best Model Selection**: Based on validation RMSE
- **Cross-Validation**: Train/Validation/Test split (60/20/20)
- **Hyperparameters**: Optimized for large-scale datasets
### Feature Engineering
The model uses 22 engineered features including:
- **Building Features**: Type, area, age, location
- **Temporal Features**: Hour, day, month, season, day of week
- **Weather Features**: Temperature, humidity, dew point
- **Interaction Features**: Building-weather interactions
- **Lag Features**: Previous consumption patterns
## Usage
### Installation
```bash
pip install scikit-learn xgboost joblib huggingface_hub
```
### Load Model
```python
from huggingface_hub import hf_hub_download
import joblib
# Download model and features
model_path = hf_hub_download(
repo_id="codealchemist01/ecologia-electricity-model",
filename="electricity_model.joblib",
token="YOUR_HF_TOKEN" # Optional if public
)
features_path = hf_hub_download(
repo_id="codealchemist01/ecologia-electricity-model",
filename="electricity_features.joblib",
token="YOUR_HF_TOKEN" # Optional if public
)
# Load model and features
model = joblib.load(model_path)
feature_columns = joblib.load(features_path)
```
### Prediction Example
```python
import pandas as pd
import numpy as np
# Prepare input data (example)
input_data = pd.DataFrame({
'building_type': ['Office'],
'area_sqm': [1000],
'year_built': [2020],
'temperature': [20.5],
'humidity': [65],
'hour': [14],
'day_of_week': [1],
'month': [6],
# ... other required features
})
# Ensure all features are present
for col in feature_columns:
if col not in input_data.columns:
input_data[col] = 0
# Select features in correct order
input_data = input_data[feature_columns]
# Make prediction
prediction = model.predict(input_data)
print(f"Predicted electricity_consumption (kWh): {prediction[0]:.2f}")
```
## Model Limitations
- Model performance may vary based on building characteristics and regional differences
- Training data is primarily from North American buildings
- Predictions are estimates and should be validated with actual consumption data
- Model requires all input features to be provided
## Ethical Considerations
- Model is designed to help reduce energy consumption and carbon footprint
- No personal or sensitive data is used in training
- Model predictions should be used responsibly for sustainability purposes
## Citation
If you use this model, please cite:
```bibtex
@software{ecologia_energy_model,
title = {Ecologia Electricity Consumption Model},
author = {Ecologia Energy Team},
year = {2024},
url = {https://huggingface.co/codealchemist01/ecologia-electricity-model},
note = {Trained on Building Data Genome Project 2 dataset}
}
```
## License
This model is released under the MIT License.
## Contact
For questions or issues, please open an issue on the repository or contact the Ecologia Energy team.
## Acknowledgments
- Building Data Genome Project 2 dataset creators
- scikit-learn and XGBoost communities
- HuggingFace for model hosting
---
*This model is part of the Ecologia sustainability platform for energy consumption prediction and carbon footprint calculation.*
|