Post
3093
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! ๐คฏ
Demo (+ source code): webml-community/DINOv3-video-tracking
This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! ๐
How does it work? ๐ค
1๏ธโฃ Generate and cache image features for each frame
2๏ธโฃ Create a list of embeddings for selected patch(es)
3๏ธโฃ Compute cosine similarity between each patch and the selected patch(es)
4๏ธโฃ Highlight those whose score is above some threshold
... et voilร ! ๐ฅณ
You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.
Excited to see what the community builds with it!
Demo (+ source code): webml-community/DINOv3-video-tracking
This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! ๐
How does it work? ๐ค
1๏ธโฃ Generate and cache image features for each frame
2๏ธโฃ Create a list of embeddings for selected patch(es)
3๏ธโฃ Compute cosine similarity between each patch and the selected patch(es)
4๏ธโฃ Highlight those whose score is above some threshold
... et voilร ! ๐ฅณ
You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.
Excited to see what the community builds with it!