Browse files
@@ -224,93 +224,86 @@ With no need for jargon, SSDS delivers tangible value to our fintech operations.
224 |
225 |
226 |
227 |
228 |
229 |
230 |
231 |
sorted feature from top(most importance)
232 |
233 |
dist_subway when at low value(green) make big impact to price
234 |
235 |
dist_store doesnt make much impact to price
236 |
237 |
high age lower the price
238 |
239 |
low age raise the price
240 |
241 |
242 |
243 |
Explain by Feature
244 |
245 |

246 |
247 |
248 |
249 |
250 |
251 |
Explain by Record
252 |
253 |

254 |
255 |
the largest contribution to positive price is dist_subway
256 |
257 |
second contribution is age
258 |
259 |
Explain by Instance
260 |
261 |

262 |
263 |
at around 500 dist_subway, it possible for positive impact and negative impact for price
264 |
265 |
over all trend is negative that mean, closer to subway is contribute to higher price
266 |
267 |
there is a point at 6500 far from subway and it has negative impact on price, despite is is close to store(dist_stores)
268 |
269 |

270 |
some how the word doesnt show in web...but this is the first decision tree inside xgboost
271 |
272 |
273 |
274 |
275 |
276 |
277 |
278 |
279 |
280 |
top 1 error, negative impact for young age in price
281 |
282 |

283 |
for top 5 error, it is possible that further from subway will have positive in price
284 |
285 |

286 |
for top 5 error, it is possible young age have negative impact and old age has positive impact in price
287 |
288 |
289 |
290 |
291 |
292 |
293 |
294 |
295 |
296 |
297 |
298 |
299 |
300 |
301 |
302 |
303 |
304 |
305 |
306 |
307 |
308 |
309 |
310 |
311 |
312 |
313 |
314 |
315 |
316 |
224 |
225 |
226 |
227 |
Explain by Dataset
228 |
229 |

230 |
231 |
**Key insights:**
232 |
- **dist_subway** has a significant impact on pricing when at low values (green).
233 |
- **dist_store** demonstrates minimal impact on price.
234 |
- Higher age correlates with lower prices while lower age raises prices.
235 |
236 |
237 |
238 |
Explain by Feature
239 |
240 |

241 |
242 |
243 |
- Prices spike for **distances lower than 900** based on the function f(x).
244 |
- Noteworthy **SHAP value at record[20] around 6500**.
245 |
246 |
247 |
Explain by Record
248 |
249 |

250 |
251 |
**Contribution to Price:**
252 |
- **dist_subway** holds the largest positive contribution to price.
253 |
- **Age** follows as the second significant contributor.
254 |
255 |
Explain by Instance
256 |
257 |

258 |
259 |
260 |
- Around **500 dist_subway**, there's a potential for both positive and negative impacts on price.
261 |
- Overall trend: closer proximity to the subway correlates with higher prices.
262 |
- An outlier at **6500 distance** from subway negatively impacts price, despite proximity to stores (dist_stores).
263 |
264 |

265 |
*Note: Unfortunately, the web doesn't display text, but this refers to the first decision tree within XGBoost.*
266 |
267 |
Explain by Top 5 Error Example
268 |
269 |

270 |
271 |
**Top Features for Errors:**
272 |
- **Age** stands out as the top feature impacting the top 5 errors negatively (for young ages).
273 |
274 |

275 |
**Top 1 Error:**
276 |
- Notably, young age has a negative impact on pricing (top 1 error).
277 |
278 |

279 |
**Insight from Errors:**
280 |
- Further distance from the subway might positively impact pricing for the top 5 errors.
281 |
282 |

283 |
**Error Instances:**
284 |
- Younger age negatively impacts price, while older age positively impacts it for the top 5 errors.
285 |
286 |
ML Observability
287 |
288 |
**Visualization with Context:**
289 |
[Tableau Visualization](https://public.tableau.com/app/profile/kevin1619/vizzes)
290 |
291 |
292 |
**Data Validation:**
293 |
- Led data validation for a new data source using covariate shift and recall methodology for legacy models.
294 |
- Ensured consistency in feature transformation between dev and prod environments.
295 |
296 |
**Unit Testing/Acceptance Testing:**
297 |
- Led unit testing for models, identified logical errors, and improved campaign lift by 50% for small businesses.
298 |
299 |
**A/B Testing for Lift:**
300 |
- Utilized statistical approaches in A/B testing for small business models, ensuring lift met criteria.
301 |
302 |
**File/Log Mining:**
303 |
- Led server observability, leveraging event journey maps to understand server downtimes.
304 |
305 |
**Root Cause Analysis:**
306 |
- Proficient in employing Six Sigma methodology to trace root causes with established metrics.
307 |
308 |
309 |