AI & ML interests

Deep Learning, Computer Vision, Machine Learning

pyimagesearch's activity

ariG23498Β 
posted an update 19 days ago
view post
Post
1952
Tried my hand at simplifying the derivations of Direct Preference Optimization.

I cover how one can reformulate RLHF into DPO. The idea of implicit reward modeling is chef's kiss.

Blog: https://huggingface.co/blog/ariG23498/rlhf-to-dpo
ariG23498Β 
posted an update 22 days ago
ariG23498Β 
posted an update 2 months ago
ariG23498Β 
posted an update 3 months ago
ariG23498Β 
posted an update 4 months ago
ariG23498Β 
posted an update 6 months ago