Spaces:

luisotorres
/

wine-quality-predictions

Running

App Files Files Community

wine-quality-predictions / README.md

luisotorres

Update README.md

cc70bc8 verified 11 days ago

preview code

raw

history blame contribute delete

4.69 kB

	---
	title: Wine Quality Prediction
	emoji: 🍷
	colorFrom: pink
	colorTo: gray
	sdk: streamlit
	sdk_version: 1.42.0
	app_file: app.py
	pinned: false
	---

	<h1>🤗 Welcome to the Wine Quality Prediction App 🤗</h1><br>

	This is my very first app here on Hugging Face's Spaces and it is my first app deploy for a machine learning model.<br>
	This app was created using a dataset generated by a deep learning model trained on the <a href = "https://www.kaggle.com/datasets/yasserh/wine-quality-dataset">Wine Quality Dataset</a>. <br>
	The dataset was generated for Episode 5 of the third season of Kaggle's Playgroud Series. It originally consists of the following attributes: <br><br>
	<table>
	<tr>
	<th>Feature</th>
	<th>Description</th>
	</tr>
	<tr>
	<td><b>Fixed Acidity</b></td>
	<td>Describes the amount of fixed acids within the wine, such as tartaric and malic acid.</td>
	</tr>
	<tr>
	<td><b>Volatile Acidity</b></td>
	<td>Describes the amount of volatile acids in the wine, such as acetic acid.</td>
	</tr>
	<tr>
	<td><b>Residual Sugar</b></td>
	<td>An organic acid found in citrus fruits, which can add a tangy flavor to the wine.</td>
	</tr>
	<tr>
	<td><b>Residual Sugar</b></td>
	<td>Describes the amount of unfermented sugar in the wine, which impacts the taste and sweetness of the wine.</td>
	</tr>
	<tr>
	<td><b>Chlorides</b></td>
	<td>Describes the amount of salt present in the wine.</td>
	</tr>
	<tr>
	<td><b>Free Sulfur Dioxide</b></td>
	<td>Describes the sulfur dioxide that hasn't reacted to other components in the wine.</td>
	</tr>
	<tr>
	<td><b>Total Sulfur Dioxide</b></td>
	<td>Describes the total amount of sulfur dioxide, including the free and bound forms.</td>
	</tr>
	<tr>
	<td><b>Density</b></td>
	<td>Describes a correlation between the wine's alcoholic content and the types of grapes used to make it.</td>
	</tr>
	<tr>
	<td><b>pH</b></td>
	<td>A measure of the acidity or basicity of the wine.</td>
	</tr>
	<tr>
	<td><b>Sulfates</b></td>
	<td>A type of salt used for preservation in wine, which can also affect its taste.</td>
	</tr>
	<tr>
	<td><b>Alcohol</b></td>
	<td>Describes the percentage of alcohol in the wine, which impacts its flavor and body.</td>
	</tr>
	<tr>
	<td><b>Quality</b></td>
	<td>Target variable.</td>
	</tr>
	</table>
	<br><br>

	The evaluation metric for this competition was the Quadratic Weighted Kappa score, evaluates the agreement between two raters, the predicted and actual values. <br>
	The formula for Quadratic Weighted Kappa is given as:<br><br>

	![](https://latex.codecogs.com/gif.latex?%5Clarge%20%5Ckappa%20%3D%201%20-%20%5Cfrac%7B%5Csum_%7Bi%2Cj%7D%20w_%7Bij%7D%20O_%7Bij%7D%7D%7B%5Csum_%7Bi%2Cj%7D%20w_%7Bij%7D%20E_%7Bij%7D%7D)
	<br><br>
	where:
	- ![](https://latex.codecogs.com/gif.latex?%5Clarge%20O_%7Bij%7D) is the actual confusion matrix.<br>
	- ![](https://latex.codecogs.com/gif.latex?%5Clarge%20E_%7Bij%7D) is the expected confusion matrix under randomness.<br>
	- ![](https://latex.codecogs.com/gif.latex?%5Clarge%20w_%7Bij%7D) is the weighted matrix, which can be calculated as ![](https://latex.codecogs.com/gif.latex?%5Clarge%20(i-j)%5E2), where `i` and `j` are the ratings.<br><br>

	A score equal to 1 suggests a perfect agreement between the raters, while a score equal to 0 indicates that the agreement is no better than what would be expected by random chance. <br>

	For this competition, I've built a <b>Pipeline</b> in which the input data gets cleaned, new features are added and selected, the categorical features get encoded with both Sklearn's <code>OrdinalEncoder</code> and <code>OneHotEncoder</code>, the data gets clustered and, finally, trained with a classifier machine learning model. <br>
	The final model in this pipeline consists of a <code>CatBoostClassifier</code> model, fine-tuned with the <code>Optuna</code> library. <br>
	The public score of this model for the competition was <code>QWK = 0.53011</code>. <br>
	The whole process, including the training and fine-tuning, as well as an extensive EDA with Plotly, can be seen on the following Kaggle notebook: <br>
	<a href = "https://www.kaggle.com/code/lusfernandotorres/wine-quality-eda-prediction-and-deploy/notebook"><b>🍷 Wine Quality: EDA, Prediction and Deploy</b></a> <br>

	I would love to hear from you! Your feedback is the key to my growth!<br>

	Thank you!

	🧑🏻‍💻 Luis Fernando Torres 🧑🏻‍💻

	Let's connect!🔗<br>
	<a href ="https://www.linkedin.com/in/luuisotorres/">LinkedIn</a> • <a href ="https://medium.com/@luuisotorres">Medium</a> • <a href ="https://www.kaggle.com/lusfernandotorres">Kaggle</a>