File size: 4,799 Bytes
02a96af
 
 
 
 
 
 
 
 
 
5eab9af
02a96af
 
 
 
 
 
 
 
5eab9af
02a96af
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---

library_name: sklearn
tags:
- energy-consumption
- regression
- random-forest
- xgboost
- building-energy
- sustainability
- carbon-footprint
pipeline_tag: tabular-regression
---


# Ecologia Electricity Consumption Model

## Model Description

This model predicts **electricity_consumption (kWh)** for buildings using machine learning ensemble methods.



- **Model Architecture**: Random Forest Regressor (Best Model)

- **Task**: Regression (Energy Consumption Prediction)

- **Target Variable**: electricity_consumption (kWh)

- **Input Features**: 22 features

- **Training Dataset**: Building Data Genome Project 2

- **Training Samples**: ~15 million



## Model Performance



### Random Forest Model

- **RMSE**: 37.6519

- **MAE**: 17.5059

- **R² Score**: 0.9587



### XGBoost Model

- **RMSE**: 59.3440

- **MAE**: 29.7273

- **R² Score**: 0.8973



### Best Model

The best performing model (based on validation RMSE) is saved as `electricity_model.joblib`.



## Training Details



### Dataset

- **Source**: [Building Data Genome Project 2](https://www.kaggle.com/datasets/claytonmiller/buildingdatagenomeproject2)

- **Training Samples**: ~15 million

- **Data Preprocessing**: 

  - Outlier removal (99th percentile)

  - Feature engineering (temporal, building, weather features)

  - Missing value imputation

  - Normalization



### Training Method

- **Algorithm**: Ensemble (Random Forest + XGBoost)

- **Best Model Selection**: Based on validation RMSE

- **Cross-Validation**: Train/Validation/Test split (60/20/20)

- **Hyperparameters**: Optimized for large-scale datasets



### Feature Engineering

The model uses 22 engineered features including:

- **Building Features**: Type, area, age, location

- **Temporal Features**: Hour, day, month, season, day of week

- **Weather Features**: Temperature, humidity, dew point

- **Interaction Features**: Building-weather interactions

- **Lag Features**: Previous consumption patterns



## Usage



### Installation

```bash

pip install scikit-learn xgboost joblib huggingface_hub

```



### Load Model

```python

from huggingface_hub import hf_hub_download

import joblib



# Download model and features

model_path = hf_hub_download(

    repo_id="codealchemist01/ecologia-electricity-model",

    filename="electricity_model.joblib",

    token="YOUR_HF_TOKEN"  # Optional if public

)



features_path = hf_hub_download(

    repo_id="codealchemist01/ecologia-electricity-model",

    filename="electricity_features.joblib",

    token="YOUR_HF_TOKEN"  # Optional if public

)



# Load model and features

model = joblib.load(model_path)

feature_columns = joblib.load(features_path)

```



### Prediction Example

```python

import pandas as pd

import numpy as np



# Prepare input data (example)

input_data = pd.DataFrame({

    'building_type': ['Office'],

    'area_sqm': [1000],

    'year_built': [2020],

    'temperature': [20.5],

    'humidity': [65],

    'hour': [14],

    'day_of_week': [1],

    'month': [6],

    # ... other required features

})



# Ensure all features are present

for col in feature_columns:

    if col not in input_data.columns:

        input_data[col] = 0



# Select features in correct order

input_data = input_data[feature_columns]



# Make prediction

prediction = model.predict(input_data)

print(f"Predicted electricity_consumption (kWh): {prediction[0]:.2f}")

```



## Model Limitations



- Model performance may vary based on building characteristics and regional differences

- Training data is primarily from North American buildings

- Predictions are estimates and should be validated with actual consumption data

- Model requires all input features to be provided



## Ethical Considerations



- Model is designed to help reduce energy consumption and carbon footprint

- No personal or sensitive data is used in training

- Model predictions should be used responsibly for sustainability purposes



## Citation



If you use this model, please cite:



```bibtex

@software{ecologia_energy_model,

  title = {Ecologia Electricity Consumption Model},

  author = {Ecologia Energy Team},

  year = {2024},

  url = {https://huggingface.co/codealchemist01/ecologia-electricity-model},

  note = {Trained on Building Data Genome Project 2 dataset}

}

```



## License



This model is released under the MIT License.



## Contact



For questions or issues, please open an issue on the repository or contact the Ecologia Energy team.



## Acknowledgments



- Building Data Genome Project 2 dataset creators

- scikit-learn and XGBoost communities

- HuggingFace for model hosting



---

*This model is part of the Ecologia sustainability platform for energy consumption prediction and carbon footprint calculation.*