Jim Lai

grimjim

AI & ML interests

Experimenting primarily with 7B-12B parameter text completion models. Not all models are intended for direct use, but aim for research and/or educational purposes.

Recent Activity

new activity about 12 hours ago

google/gemma-2-2b-it:SLERP merge example code?

published a model about 19 hours ago

grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B

updated a model about 20 hours ago

grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B

View all activity

Organizations

Posts 19

Post

1689

A recent merge has provided another interesting result on the current Open LLM leaderboard.
open-llm-leaderboard/open_llm_leaderboard

Combining an o1 reasoning merge with VAGOsolutions's Llama-3.1 SauerkrautLM 8B Instruct model resulted in a lower IFEval, but a higher result in every other benchmark. This result is currently my best Llama 3.1 8B merge result to date.
grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B
The results suggest that defects in output format and/or output parsing may be limiting benchmark performance of various o1 models.

Post

1584

I've arrived at an interesting result on the current Open LLM leaderboard.
open-llm-leaderboard/open_llm_leaderboard
After I narrowed down the filter of models to be between 8-9B parameters, my recent merge of o1 reasoning models achieved the highest MATH eval result of any Llama 3.x 8B model currently on the board, hitting 33.99%, placing 973/2795.
grimjim/HuatuoSkywork-o1-Llama-3.1-8B

Unfortunately, I need more information to evaluate the parent models used in the merge.
The Skywork/Skywork-o1-Open-Llama-3.1-8B model scored 0% on the MATH eval, which I suspect was due to output formatting that was baked too hard into the model, and placed 2168/2795; the merge achieved a significant uplift in every benchmark across the board.
Unfortunately, FreedomIntelligence/HuatuoGPT-o1-8B was not currently benched as of this post, so I am unable to assess relative benchmarks. Nevertheless, it is intriguing that an ostensibly medical o1 model appears to have resulted in a sizable MATH boost.

View all posts

Collections 5

models 127

datasets 3

grimjim/empatheticdialogues

Updated 17 days ago • 40

grimjim/PAlign-PAPI-personality_prompt.json-cleaned

Viewer • Updated Sep 21, 2024 • 300 • 49

grimjim/adversarial-10-alpaca

Viewer • Updated Aug 16, 2024 • 10 • 37 • 1

Jim Lai

AI & ML interests

Recent Activity

Organizations

Posts 19

Collections 5

grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B

grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B-GGUF

grimjim/HuatuoSkywork-o1-Llama-3.1-8B

grimjim/llama-3-Nephilim-v3-8B

grimjim/kuno-kunoichi-v1-DPO-v2-SLERP-7B

grimjim/kukulemon-7B

grimjim/kukulemon-spiked-9B

grimjim/kukulemon-32K-7B

models 127

grimjim/DeepSauerHuatuoSkywork-R1-o1-Llama-3.1-8B

grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B-GGUF

grimjim/SauerHuatuoSkywork-o1-Llama-3.1-8B

grimjim/BadApple-o1-Llama-3.1-8B

grimjim/Magnolia-v4-12B

grimjim/HuatuoSkywork-o1-Llama-3.1-8B

grimjim/Magnolia-v4-Gemma2-8k-9B

grimjim/Llama3.1-SuperNovaLite-HuatuoSkywork-o1-8B

grimjim/lemon07r_Gemma-2-Ataraxy-v4c-9B_fixed

grimjim/Magnolia-v3-Gemma2-8k-9B

datasets 3

grimjim/empatheticdialogues

grimjim/PAlign-PAPI-personality_prompt.json-cleaned

grimjim/adversarial-10-alpaca

Jim Lai

AI & ML interests

Recent Activity

Organizations

Posts 19

Collections 5

models 127 Sort: Recently updated

datasets 3 Sort: Recently updated

models 127

datasets 3