|
--- |
|
title: Wine Quality Prediction |
|
emoji: π· |
|
colorFrom: pink |
|
colorTo: gray |
|
sdk: streamlit |
|
sdk_version: 1.42.0 |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
<h1>π€ Welcome to the Wine Quality Prediction App π€</h1><br> |
|
|
|
This is my very first app here on Hugging Face's Spaces and it is my first app deploy for a machine learning model.<br> |
|
This app was created using a dataset generated by a deep learning model trained on the <a href = "https://www.kaggle.com/datasets/yasserh/wine-quality-dataset">Wine Quality Dataset</a>. <br> |
|
The dataset was generated for Episode 5 of the third season of Kaggle's Playgroud Series. It originally consists of the following attributes: <br><br> |
|
<table> |
|
<tr> |
|
<th>Feature</th> |
|
<th>Description</th> |
|
</tr> |
|
<tr> |
|
<td><b>Fixed Acidity</b></td> |
|
<td>Describes the amount of fixed acids within the wine, such as tartaric and malic acid.</td> |
|
</tr> |
|
<tr> |
|
<td><b>Volatile Acidity</b></td> |
|
<td>Describes the amount of volatile acids in the wine, such as acetic acid.</td> |
|
</tr> |
|
<tr> |
|
<td><b>Residual Sugar</b></td> |
|
<td>An organic acid found in citrus fruits, which can add a tangy flavor to the wine.</td> |
|
</tr> |
|
<tr> |
|
<td><b>Residual Sugar</b></td> |
|
<td>Describes the amount of unfermented sugar in the wine, which impacts the taste and sweetness of the wine.</td> |
|
</tr> |
|
<tr> |
|
<td><b>Chlorides</b></td> |
|
<td>Describes the amount of salt present in the wine.</td> |
|
</tr> |
|
<tr> |
|
<td><b>Free Sulfur Dioxide</b></td> |
|
<td>Describes the sulfur dioxide that hasn't reacted to other components in the wine.</td> |
|
</tr> |
|
<tr> |
|
<td><b>Total Sulfur Dioxide</b></td> |
|
<td>Describes the total amount of sulfur dioxide, including the free and bound forms.</td> |
|
</tr> |
|
<tr> |
|
<td><b>Density</b></td> |
|
<td>Describes a correlation between the wine's alcoholic content and the types of grapes used to make it.</td> |
|
</tr> |
|
<tr> |
|
<td><b>pH</b></td> |
|
<td>A measure of the acidity or basicity of the wine.</td> |
|
</tr> |
|
<tr> |
|
<td><b>Sulfates</b></td> |
|
<td>A type of salt used for preservation in wine, which can also affect its taste.</td> |
|
</tr> |
|
<tr> |
|
<td><b>Alcohol</b></td> |
|
<td>Describes the percentage of alcohol in the wine, which impacts its flavor and body.</td> |
|
</tr> |
|
<tr> |
|
<td><b>Quality</b></td> |
|
<td>Target variable.</td> |
|
</tr> |
|
</table> |
|
<br><br> |
|
|
|
The evaluation metric for this competition was the Quadratic Weighted Kappa score, evaluates the agreement between two raters, the predicted and actual values. <br> |
|
The formula for Quadratic Weighted Kappa is given as:<br><br> |
|
|
|
data:image/s3,"s3://crabby-images/d8087/d8087819e47fab77bd93dedbfe7091f3c4116643" alt="" |
|
<br><br> |
|
where: |
|
- data:image/s3,"s3://crabby-images/c9534/c9534cd4f6606cbd80712a1825234bf6533a2e4e" alt="" is the actual confusion matrix.<br> |
|
- data:image/s3,"s3://crabby-images/2705f/2705f692cad3c5cd5bcfea087408049dd36f98b4" alt="" is the expected confusion matrix under randomness.<br> |
|
- data:image/s3,"s3://crabby-images/95f9d/95f9d362b32bd0a2f7bfae094ff60672e9e470fe" alt="" is the weighted matrix, which can be calculated as data:image/s3,"s3://crabby-images/4612c/4612c0a54162d9892539afeeeed5ba6641a1dcb4" alt=""%5E2), where `i` and `j` are the ratings.<br><br> |
|
|
|
A score equal to 1 suggests a perfect agreement between the raters, while a score equal to 0 indicates that the agreement is no better than what would be expected by random chance. <br> |
|
|
|
For this competition, I've built a <b>Pipeline</b> in which the input data gets cleaned, new features are added and selected, the categorical features get encoded with both Sklearn's <code>OrdinalEncoder</code> and <code>OneHotEncoder</code>, the data gets clustered and, finally, trained with a classifier machine learning model. <br> |
|
The final model in this pipeline consists of a <code>CatBoostClassifier</code> model, fine-tuned with the <code>Optuna</code> library. <br> |
|
The public score of this model for the competition was <code>QWK = 0.53011</code>. <br> |
|
The whole process, including the training and fine-tuning, as well as an extensive EDA with Plotly, can be seen on the following Kaggle notebook: <br> |
|
<a href = "https://www.kaggle.com/code/lusfernandotorres/wine-quality-eda-prediction-and-deploy/notebook"><b>π· Wine Quality: EDA, Prediction and Deploy</b></a> <br> |
|
|
|
I would love to hear from you! Your feedback is the key to my growth!<br> |
|
|
|
*Thank you!* |
|
|
|
π§π»βπ» **Luis Fernando Torres** π§π»βπ» |
|
|
|
Let's connect!π<br> |
|
<a href ="https://www.linkedin.com/in/luuisotorres/">LinkedIn</a> β’ <a href ="https://medium.com/@luuisotorres">Medium</a> β’ <a href ="https://www.kaggle.com/lusfernandotorres">Kaggle</a> |