File size: 4,692 Bytes
5d52132
1f3ef8d
95e70f2
5d52132
 
 
cc70bc8
5d52132
 
 
 
1f3ef8d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c8fe3c6
 
1f3ef8d
 
 
 
 
 
 
0b41a14
cc70bc8
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
title: Wine Quality Prediction
emoji: 🍷
colorFrom: pink
colorTo: gray
sdk: streamlit
sdk_version: 1.42.0
app_file: app.py
pinned: false
---

<h1>πŸ€— Welcome to the Wine Quality Prediction App πŸ€—</h1><br>

This is my very first app here on Hugging Face's Spaces and it is my first app deploy for a machine learning model.<br>
This app was created using a dataset generated by a deep learning model trained on the <a href = "https://www.kaggle.com/datasets/yasserh/wine-quality-dataset">Wine Quality Dataset</a>. <br>
The dataset was generated for Episode 5 of the third season of Kaggle's Playgroud Series. It originally consists of the following attributes: <br><br>
<table>
  <tr>
    <th>Feature</th>
    <th>Description</th>
  </tr>
  <tr>
    <td><b>Fixed Acidity</b></td>
    <td>Describes the amount of fixed acids within the wine, such as tartaric and malic acid.</td>
  </tr>
  <tr>
    <td><b>Volatile Acidity</b></td>
    <td>Describes the amount of volatile acids in the wine, such as acetic acid.</td>
  </tr>
  <tr>
    <td><b>Residual Sugar</b></td>
    <td>An organic acid found in citrus fruits, which can add a tangy flavor to the wine.</td>
  </tr>
  <tr>
    <td><b>Residual Sugar</b></td>
    <td>Describes the amount of unfermented sugar in the wine, which impacts the taste and sweetness of the wine.</td>
  </tr>
  <tr>
    <td><b>Chlorides</b></td>
    <td>Describes the amount of salt present in the wine.</td>
  </tr>
  <tr>
    <td><b>Free Sulfur Dioxide</b></td>
    <td>Describes the sulfur dioxide that hasn't reacted to other components in the wine.</td>
  </tr>
  <tr>
    <td><b>Total Sulfur Dioxide</b></td>
    <td>Describes the total amount of sulfur dioxide, including the free and bound forms.</td>
  </tr>
  <tr>
    <td><b>Density</b></td>
    <td>Describes a correlation between the wine's alcoholic content and the types of grapes used to make it.</td>
  </tr>
  <tr>
    <td><b>pH</b></td>
    <td>A measure of the acidity or basicity of the wine.</td>
  </tr>
  <tr>
    <td><b>Sulfates</b></td>
    <td>A type of salt used for preservation in wine, which can also affect its taste.</td>
  </tr>
  <tr>
    <td><b>Alcohol</b></td>
    <td>Describes the percentage of alcohol in the wine, which impacts its flavor and body.</td>
  </tr>
  <tr>
    <td><b>Quality</b></td>
    <td>Target variable.</td>
  </tr>
</table>
<br><br>

The evaluation metric for this competition was the Quadratic Weighted Kappa score, evaluates the agreement between two raters, the predicted and actual values. <br>
The formula for Quadratic Weighted Kappa is given as:<br><br>

![](https://latex.codecogs.com/gif.latex?%5Clarge%20%5Ckappa%20%3D%201%20-%20%5Cfrac%7B%5Csum_%7Bi%2Cj%7D%20w_%7Bij%7D%20O_%7Bij%7D%7D%7B%5Csum_%7Bi%2Cj%7D%20w_%7Bij%7D%20E_%7Bij%7D%7D)
<br><br>
where:
- ![](https://latex.codecogs.com/gif.latex?%5Clarge%20O_%7Bij%7D) is the actual confusion matrix.<br>
- ![](https://latex.codecogs.com/gif.latex?%5Clarge%20E_%7Bij%7D) is the expected confusion matrix under randomness.<br>
- ![](https://latex.codecogs.com/gif.latex?%5Clarge%20w_%7Bij%7D) is the weighted matrix, which can be calculated as ![](https://latex.codecogs.com/gif.latex?%5Clarge%20(i-j)%5E2), where `i` and `j` are the ratings.<br><br>

A score equal to 1 suggests a perfect agreement between the raters, while a score equal to 0 indicates that the agreement is no better than what would be expected by random chance. <br>

For this competition, I've built a <b>Pipeline</b> in which the input data gets cleaned, new features are added and selected, the categorical features get encoded with both Sklearn's <code>OrdinalEncoder</code> and <code>OneHotEncoder</code>, the data gets clustered and, finally, trained with a classifier machine learning model. <br>
The final model in this pipeline consists of a <code>CatBoostClassifier</code> model, fine-tuned with the <code>Optuna</code> library. <br>
The public score of this model for the competition was <code>QWK = 0.53011</code>. <br>
The whole process, including the training and fine-tuning, as well as an extensive EDA with Plotly, can be seen on the following Kaggle notebook: <br>
<a href = "https://www.kaggle.com/code/lusfernandotorres/wine-quality-eda-prediction-and-deploy/notebook"><b>🍷 Wine Quality: EDA, Prediction and Deploy</b></a> <br>

I would love to hear from you! Your feedback is the key to my growth!<br>

*Thank you!*

πŸ§‘πŸ»β€πŸ’» **Luis Fernando Torres** πŸ§‘πŸ»β€πŸ’»

Let's connect!πŸ”—<br>
<a href ="https://www.linkedin.com/in/luuisotorres/">LinkedIn</a> β€’ <a href ="https://medium.com/@luuisotorres">Medium</a> β€’ <a href ="https://www.kaggle.com/lusfernandotorres">Kaggle</a>