kevinhug commited on
Commit
ae5a310
·
1 Parent(s): 3aa2412
Files changed (1) hide show
  1. app.py +74 -81
app.py CHANGED
@@ -224,93 +224,86 @@ With no need for jargon, SSDS delivers tangible value to our fintech operations.
224
  df=pd.read_csv("./xgb/re.csv")
225
 
226
  gr.Markdown("""
227
- Explain by Dataset
228
- =============
229
- ![summary](file=./xgb/data.png)
230
-
231
- sorted feature from top(most importance)
232
-
233
- dist_subway when at low value(green) make big impact to price
234
-
235
- dist_store doesnt make much impact to price
236
-
237
- high age lower the price
238
-
239
- low age raise the price
240
-
241
-
242
-
243
- Explain by Feature
244
- =============
245
- ![partial_dependence](file=./xgb/feature.png)
246
 
247
- dist lower than 900 spike the price f(x)
248
-
249
- also highlighted the shap value for record[20] at around 6500
250
-
251
- Explain by Record
252
- =============
253
- ![force](file=./xgb/record.png)
254
-
255
- the largest contribution to positive price is dist_subway
256
-
257
- second contribution is age
258
 
259
- Explain by Instance
260
- =============
261
- ![dependence](file=./xgb/instance.png)
262
-
263
- at around 500 dist_subway, it possible for positive impact and negative impact for price
264
 
265
- over all trend is negative that mean, closer to subway is contribute to higher price
266
-
267
- there is a point at 6500 far from subway and it has negative impact on price, despite is is close to store(dist_stores)
268
-
269
- ![1st decision tree](file=./xgb/tree.svg)
270
- some how the word doesnt show in web...but this is the first decision tree inside xgboost
271
 
272
- Explain by Top 5 Error Example
273
- =============
274
- ![](file=./xgb/error_data.png)
275
- top feature for top 5 error is age
276
-
277
- young age has negative impact on price
278
-
279
- ![](file=./xgb/error_record.png)
280
- top 1 error, negative impact for young age in price
281
-
282
- ![](file=./xgb/error_feature.png)
283
- for top 5 error, it is possible that further from subway will have positive in price
284
-
285
- ![](file=./xgb/error_instance.png)
286
- for top 5 error, it is possible young age have negative impact and old age has positive impact in price
287
 
288
- ML Observability
289
- =============
290
- Visualization with Context
291
- https://public.tableau.com/app/profile/kevin1619/vizzes
 
 
 
 
 
 
 
 
 
 
 
 
292
 
293
- Data Validation
294
- -----------
295
- I led data validation for new data source for legacy model using covariate shift, recall methodology
296
-
297
- Ensure feature transformation are same in dev and prod environment
298
-
299
- Unit Testing/Acceptance Testing
300
- -----------
301
- I led unit testing for model, and discover logical error, improve lift by 50% for small business campaign
302
-
303
- A/B Testing for lift
304
- -----------
305
- A/B testing for small business model using statistical approach to ensure lift pass criteria
306
-
307
- File/Log Mining
308
- -----------
309
- I led server observability to understand why server was brought down with event journey map.
310
-
311
- Root Cause Analysis
312
- -----------
313
- With the right metric in place, I can trace back to the root cause with six sigma methodology
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
314
  """)
315
 
316
 
 
224
  df=pd.read_csv("./xgb/re.csv")
225
 
226
  gr.Markdown("""
227
+ Explain by Dataset
228
+ ===============
229
+ ![Summary](file=./xgb/data.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
230
 
231
+ **Key insights:**
232
+ - **dist_subway** has a significant impact on pricing when at low values (green).
233
+ - **dist_store** demonstrates minimal impact on price.
234
+ - Higher age correlates with lower prices while lower age raises prices.
 
 
 
 
 
 
 
235
 
 
 
 
 
 
236
 
 
 
 
 
 
 
237
 
238
+ Explain by Feature
239
+ ===============
240
+ ![Partial Dependence](file=./xgb/feature.png)
241
+
242
+ **Observations:**
243
+ - Prices spike for **distances lower than 900** based on the function f(x).
244
+ - Noteworthy **SHAP value at record[20] around 6500**.
245
+
 
 
 
 
 
 
 
246
 
247
+ Explain by Record
248
+ ===============
249
+ ![Force](file=./xgb/record.png)
250
+
251
+ **Contribution to Price:**
252
+ - **dist_subway** holds the largest positive contribution to price.
253
+ - **Age** follows as the second significant contributor.
254
+
255
+ Explain by Instance
256
+ ===============
257
+ ![Dependence](file=./xgb/instance.png)
258
+
259
+ **Insights:**
260
+ - Around **500 dist_subway**, there's a potential for both positive and negative impacts on price.
261
+ - Overall trend: closer proximity to the subway correlates with higher prices.
262
+ - An outlier at **6500 distance** from subway negatively impacts price, despite proximity to stores (dist_stores).
263
 
264
+ ![1st Decision Tree](file=./xgb/tree.svg)
265
+ *Note: Unfortunately, the web doesn't display text, but this refers to the first decision tree within XGBoost.*
266
+
267
+ Explain by Top 5 Error Example
268
+ ===============
269
+ ![Top 5 Error Data](file=./xgb/error_data.png)
270
+
271
+ **Top Features for Errors:**
272
+ - **Age** stands out as the top feature impacting the top 5 errors negatively (for young ages).
273
+
274
+ ![Error Record](file=./xgb/error_record.png)
275
+ **Top 1 Error:**
276
+ - Notably, young age has a negative impact on pricing (top 1 error).
277
+
278
+ ![Error Feature](file=./xgb/error_feature.png)
279
+ **Insight from Errors:**
280
+ - Further distance from the subway might positively impact pricing for the top 5 errors.
281
+
282
+ ![Error Instance](file=./xgb/error_instance.png)
283
+ **Error Instances:**
284
+ - Younger age negatively impacts price, while older age positively impacts it for the top 5 errors.
285
+
286
+ ML Observability
287
+ ===============
288
+ **Visualization with Context:**
289
+ [Tableau Visualization](https://public.tableau.com/app/profile/kevin1619/vizzes)
290
+
291
+
292
+ **Data Validation:**
293
+ - Led data validation for a new data source using covariate shift and recall methodology for legacy models.
294
+ - Ensured consistency in feature transformation between dev and prod environments.
295
+
296
+ **Unit Testing/Acceptance Testing:**
297
+ - Led unit testing for models, identified logical errors, and improved campaign lift by 50% for small businesses.
298
+
299
+ **A/B Testing for Lift:**
300
+ - Utilized statistical approaches in A/B testing for small business models, ensuring lift met criteria.
301
+
302
+ **File/Log Mining:**
303
+ - Led server observability, leveraging event journey maps to understand server downtimes.
304
+
305
+ **Root Cause Analysis:**
306
+ - Proficient in employing Six Sigma methodology to trace root causes with established metrics.
307
  """)
308
 
309