luisotorres's picture
Update README.md
cc70bc8 verified
---
title: Wine Quality Prediction
emoji: 🍷
colorFrom: pink
colorTo: gray
sdk: streamlit
sdk_version: 1.42.0
app_file: app.py
pinned: false
---
<h1>πŸ€— Welcome to the Wine Quality Prediction App πŸ€—</h1><br>
This is my very first app here on Hugging Face's Spaces and it is my first app deploy for a machine learning model.<br>
This app was created using a dataset generated by a deep learning model trained on the <a href = "https://www.kaggle.com/datasets/yasserh/wine-quality-dataset">Wine Quality Dataset</a>. <br>
The dataset was generated for Episode 5 of the third season of Kaggle's Playgroud Series. It originally consists of the following attributes: <br><br>
<table>
<tr>
<th>Feature</th>
<th>Description</th>
</tr>
<tr>
<td><b>Fixed Acidity</b></td>
<td>Describes the amount of fixed acids within the wine, such as tartaric and malic acid.</td>
</tr>
<tr>
<td><b>Volatile Acidity</b></td>
<td>Describes the amount of volatile acids in the wine, such as acetic acid.</td>
</tr>
<tr>
<td><b>Residual Sugar</b></td>
<td>An organic acid found in citrus fruits, which can add a tangy flavor to the wine.</td>
</tr>
<tr>
<td><b>Residual Sugar</b></td>
<td>Describes the amount of unfermented sugar in the wine, which impacts the taste and sweetness of the wine.</td>
</tr>
<tr>
<td><b>Chlorides</b></td>
<td>Describes the amount of salt present in the wine.</td>
</tr>
<tr>
<td><b>Free Sulfur Dioxide</b></td>
<td>Describes the sulfur dioxide that hasn't reacted to other components in the wine.</td>
</tr>
<tr>
<td><b>Total Sulfur Dioxide</b></td>
<td>Describes the total amount of sulfur dioxide, including the free and bound forms.</td>
</tr>
<tr>
<td><b>Density</b></td>
<td>Describes a correlation between the wine's alcoholic content and the types of grapes used to make it.</td>
</tr>
<tr>
<td><b>pH</b></td>
<td>A measure of the acidity or basicity of the wine.</td>
</tr>
<tr>
<td><b>Sulfates</b></td>
<td>A type of salt used for preservation in wine, which can also affect its taste.</td>
</tr>
<tr>
<td><b>Alcohol</b></td>
<td>Describes the percentage of alcohol in the wine, which impacts its flavor and body.</td>
</tr>
<tr>
<td><b>Quality</b></td>
<td>Target variable.</td>
</tr>
</table>
<br><br>
The evaluation metric for this competition was the Quadratic Weighted Kappa score, evaluates the agreement between two raters, the predicted and actual values. <br>
The formula for Quadratic Weighted Kappa is given as:<br><br>
![](https://latex.codecogs.com/gif.latex?%5Clarge%20%5Ckappa%20%3D%201%20-%20%5Cfrac%7B%5Csum_%7Bi%2Cj%7D%20w_%7Bij%7D%20O_%7Bij%7D%7D%7B%5Csum_%7Bi%2Cj%7D%20w_%7Bij%7D%20E_%7Bij%7D%7D)
<br><br>
where:
- ![](https://latex.codecogs.com/gif.latex?%5Clarge%20O_%7Bij%7D) is the actual confusion matrix.<br>
- ![](https://latex.codecogs.com/gif.latex?%5Clarge%20E_%7Bij%7D) is the expected confusion matrix under randomness.<br>
- ![](https://latex.codecogs.com/gif.latex?%5Clarge%20w_%7Bij%7D) is the weighted matrix, which can be calculated as ![](https://latex.codecogs.com/gif.latex?%5Clarge%20(i-j)%5E2), where `i` and `j` are the ratings.<br><br>
A score equal to 1 suggests a perfect agreement between the raters, while a score equal to 0 indicates that the agreement is no better than what would be expected by random chance. <br>
For this competition, I've built a <b>Pipeline</b> in which the input data gets cleaned, new features are added and selected, the categorical features get encoded with both Sklearn's <code>OrdinalEncoder</code> and <code>OneHotEncoder</code>, the data gets clustered and, finally, trained with a classifier machine learning model. <br>
The final model in this pipeline consists of a <code>CatBoostClassifier</code> model, fine-tuned with the <code>Optuna</code> library. <br>
The public score of this model for the competition was <code>QWK = 0.53011</code>. <br>
The whole process, including the training and fine-tuning, as well as an extensive EDA with Plotly, can be seen on the following Kaggle notebook: <br>
<a href = "https://www.kaggle.com/code/lusfernandotorres/wine-quality-eda-prediction-and-deploy/notebook"><b>🍷 Wine Quality: EDA, Prediction and Deploy</b></a> <br>
I would love to hear from you! Your feedback is the key to my growth!<br>
*Thank you!*
πŸ§‘πŸ»β€πŸ’» **Luis Fernando Torres** πŸ§‘πŸ»β€πŸ’»
Let's connect!πŸ”—<br>
<a href ="https://www.linkedin.com/in/luuisotorres/">LinkedIn</a> β€’ <a href ="https://medium.com/@luuisotorres">Medium</a> β€’ <a href ="https://www.kaggle.com/lusfernandotorres">Kaggle</a>