metadata

title: Wine Quality Prediction
emoji: 🍷
colorFrom: pink
colorTo: gray
sdk: streamlit
sdk_version: 1.42.0
app_file: app.py
pinned: false

🤗 Welcome to the Wine Quality Prediction App 🤗

This is my very first app here on Hugging Face's Spaces and it is my first app deploy for a machine learning model.
This app was created using a dataset generated by a deep learning model trained on the Wine Quality Dataset.
The dataset was generated for Episode 5 of the third season of Kaggle's Playgroud Series. It originally consists of the following attributes:

Feature	Description
Fixed Acidity	Describes the amount of fixed acids within the wine, such as tartaric and malic acid.
Volatile Acidity	Describes the amount of volatile acids in the wine, such as acetic acid.
Residual Sugar	An organic acid found in citrus fruits, which can add a tangy flavor to the wine.
Residual Sugar	Describes the amount of unfermented sugar in the wine, which impacts the taste and sweetness of the wine.
Chlorides	Describes the amount of salt present in the wine.
Free Sulfur Dioxide	Describes the sulfur dioxide that hasn't reacted to other components in the wine.
Total Sulfur Dioxide	Describes the total amount of sulfur dioxide, including the free and bound forms.
Density	Describes a correlation between the wine's alcoholic content and the types of grapes used to make it.
pH	A measure of the acidity or basicity of the wine.
Sulfates	A type of salt used for preservation in wine, which can also affect its taste.
Alcohol	Describes the percentage of alcohol in the wine, which impacts its flavor and body.
Quality	Target variable.

The evaluation metric for this competition was the Quadratic Weighted Kappa score, evaluates the agreement between two raters, the predicted and actual values.
The formula for Quadratic Weighted Kappa is given as:

$\kappa = 1 - \frac{\sum_{i,j} w_{ij} O_{ij}}{\sum_{i,j} w_{ij} E_{ij}}$

where:

$O_{ij}$ is the actual confusion matrix.
$E_{ij}$ is the expected confusion matrix under randomness.
$w_{ij}$ is the weighted matrix, which can be calculated as $(i-j)^2$ , where i and j are the ratings.

A score equal to 1 suggests a perfect agreement between the raters, while a score equal to 0 indicates that the agreement is no better than what would be expected by random chance.

For this competition, I've built a Pipeline in which the input data gets cleaned, new features are added and selected, the categorical features get encoded with both Sklearn's OrdinalEncoder and OneHotEncoder, the data gets clustered and, finally, trained with a classifier machine learning model.
The final model in this pipeline consists of a CatBoostClassifier model, fine-tuned with the Optuna library.
The public score of this model for the competition was QWK = 0.53011.
The whole process, including the training and fine-tuning, as well as an extensive EDA with Plotly, can be seen on the following Kaggle notebook:
🍷 Wine Quality: EDA, Prediction and Deploy

I would love to hear from you! Your feedback is the key to my growth!

Thank you!

🧑🏻‍💻 Luis Fernando Torres 🧑🏻‍💻

Let's connect!🔗
LinkedIn • Medium • Kaggle