Arman Jukic

Jukka-Sun

Jukka-Sun

AI & ML interests

AI, machinelearning, deep-learning, comuter-vision, programming

Recent Activity

liked a dataset about 7 hours ago

facebook/natural_reasoning

liked a model about 7 hours ago

google/gemma-3-27b-it

liked a model 10 days ago

mlx-community/UI-TARS-7B-DPO-8bit

View all activity

Organizations

None yet

Jukka-Sun's activity

liked a dataset about 7 hours ago

facebook/natural_reasoning

Viewer • Updated 21 days ago • 1.15M • 10.7k • 398

liked a model about 7 hours ago

google/gemma-3-27b-it

Image-Text-to-Text • Updated 1 day ago • 38.5k • 461

liked 2 models 10 days ago

mlx-community/UI-TARS-7B-DPO-8bit

Image-Text-to-Text • Updated 10 days ago • 49 • 1

Wan-AI/Wan2.1-T2V-14B

Text-to-Video • Updated 1 day ago • 207k • • 1.01k

reacted to Jaward's post with ❤️ 12 days ago

Post

4935

made a few improvements on custom grpo trainer:
- added sequence similarity reward (seems to work)
- improved vllm support (5x inference speed)
- adjusted reward scores (this helped with format/accuracy)
- can now push to hf hub (already pushed mine lol: Jaward/smollm2_360m_grpo_gsm8k_reasoner)

Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb

liked a model 12 days ago

allenai/olmOCR-7B-0225-preview

Image-Text-to-Text • Updated 17 days ago • 193k • 541

liked a model 21 days ago

perplexity-ai/r1-1776

Text Generation • Updated 15 days ago • 55k • • 2.12k

reacted to benhaotang's post with 🔥 24 days ago

Post

2394

Try out my updated implementation of forked OpenDeepResearcher(link below) as an OpenAI compatible endpoint, but with full control, can be deployed completely free with Gemini api or completely locally with ollama, or pay-as-you-go in BYOK format, the AI agents will think dynamically based on the difficulties of given research, compatible with any OpenAI compatible configurable clients(Msty, Chatbox, even vscode AI Toolkit playground).

If you don't want to pay OpenAI $200 to use or want to take control of your deep research, check out here:
👉 https://github.com/benhaotang/OpenDeepResearcher-via-searxng

**Personal take**

Based on my testing against Perplexity's and Gemini's implementation with some Physics domain questions, mine is comparable and very competent at finding even the most rare articles or methods.

Also a funny benchmark of mine to test all these searching models, is to trouble shot a WSL2 hanging issue I experienced last year, with prompt:

> wsl2 in windows hangs in background with high vmmem cpu usage once in a while, especially after hibernation, no error logs captured in linux, also unable to shutdown in powershell, provide solutions

the final solution that took me a day last year to find is to patch the kernel with some steps documented in carlfriedrich's repo and wait Microsoft to solve it(it is buried deep in wsl issues). Out of the three, only my Deep Research agent has found this solution, Perplexity and Gemini just focus on other force restart or memory management methods. I am very impressed with how it has this kind of obscure and scarce trouble shooting ability.

**Limitations**

Some caveats to be done later:
- Multi-turn conversation is not yet supported, so no follow-up questions
- System message is only extra writing instructions, don't affect on search
- Small local model may have trouble citing source reliably, I am working on a fix to fact check all citation claims

1 reply

reacted to tianchez's post with 👍 24 days ago

Post

4114

Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1

3 replies

reacted to Jaward's post with 👀 24 days ago

Post

3867

Finally here it is: a faster, custom, scalable GRPO trainer for smaller models with < 500M params, can train on 8gb ram cpu, also supports gpu for sanity sake (includes support for vllm + flash attention). Using smolLM2-135M/360M-instructs as ref & base models. Experience your own “aha” moment 🐳 on 8gb ram.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb