Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
codelionΒ 
posted an update 7 days ago
Post
4904
I recently added a recipe in ellora to improve reasoning capabilities to Gemma-3-1B using self-supervised learning. Model now shows step-by-step thinking in <think> tags before answering.

Logic puzzle accuracy: 61% β†’ 84%. 3 hours training on single GPU. 🧠

Used GRPO where model generates multiple responses and learns to prefer better reasoning. Works surprisingly well for making smaller models more transparent.

πŸ”— Colab: https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_2_Reasoning_LoRA_with_Self-Rewarding_GRPO.ipynb

πŸ€— Model: codelion/gemma-3-1b-it-reasoning-grpo-lora

πŸ’» Code: https://github.com/codelion/ellora

Interesting and curious.

In my limited experience, asking for flaws in something i would only get surface level identification of flaws of why something wouldn't work.

This potentially may improve that a lot. Yeah wanting to build tower out of gold is a silly example, but i see a lot of movies and books where events happen 'so the plot can move forward', and making idiotic choices is one thing that has turned me off from a lot of media including books; Writers using something to identify and remove the braindead stupidities would certainly be a huge improvement in the long run.

In this post