91 17 217

t.d.a.g. PRO

sequelbox

sequelbox.bsky.social

AI & ML interests

open source, infinite games. (they/them)

Recent Activity

liked a model 2 days ago

NousResearch/Hermes-4-405B

upvoted a paper 2 days ago

Hermes 4 Technical Report

reacted to codelion's post with 🔥 2 days ago

I recently worked on a LoRA that improves tool use in LLM. Thought the approach might interest folks here. The issue I have had when trying to use some of the local LLMs with coding agents is this: Me: "Find all API endpoints with authentication in this codebase" LLM: "You should look for @app.route decorators and check if they have auth middleware..." But I often want it to search the files and show me but the LLM doesn't trigger a tool use call. To fine-tune it for tool use I combined two data sources: 1. Magpie scenarios - 5000+ diverse tasks (bug hunting, refactoring, security audits) 2. Real execution - Ran these on actual repos (FastAPI, Django, React) to get authentic tool responses This ensures the model learns both breadth (many scenarios) and depth (real tool behavior). Tools We Taught: - `read_file` - Actually read file contents - `search_files` - Regex/pattern search across codebases - `find_definition` - Locate classes/functions - `analyze_imports` - Dependency tracking - `list_directory` - Explore structure - `run_tests` - Execute test suites Improvements: - Tool calling accuracy: 12% → 80% - Correct parameters: 8% → 87% - Multi-step tasks: 3% → 78% - End-to-end completion: 5% → 80% - Tools per task: 0.2 → 3.8 The LoRA really improves on intential tool call as an example consider the query: "Find ValueError in payment module" The response proceeds as follows: 1. Calls `search_files` with pattern "ValueError" 2. Gets 4 matches across 3 files 3. Calls `read_file` on each match 4. Analyzes context 5. Reports: "Found 3 ValueError instances: payment/processor.py:47 for invalid amount, payment/validator.py:23 for unsupported currency..." Resources: - Colab notebook https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_3_Enhanced_Tool_Calling_and_Code_Understanding.ipynb - Model - https://huggingface.co/codelion/Llama-3.2-1B-Instruct-tool-calling-lora - GitHub - https://github.com/codelion/ellora

View all activity

Organizations

liked a model 2 days ago

NousResearch/Hermes-4-405B

Text Generation • 406B • Updated about 1 hour ago • 252 • • 49

upvoted a paper 2 days ago

Hermes 4 Technical Report

Paper • 2508.18255 • Published 7 days ago • 30

reacted to codelion's post with 🔥 2 days ago

Post

5876

I recently worked on a LoRA that improves tool use in LLM. Thought the approach might interest folks here.

The issue I have had when trying to use some of the local LLMs with coding agents is this:

Me: "Find all API endpoints with authentication in this codebase"
LLM: "You should look for @app .route decorators and check if they have auth middleware..."

But I often want it to search the files and show me but the LLM doesn't trigger a tool use call.

To fine-tune it for tool use I combined two data sources:

1. Magpie scenarios - 5000+ diverse tasks (bug hunting, refactoring, security audits)
2. Real execution - Ran these on actual repos (FastAPI, Django, React) to get authentic tool responses

This ensures the model learns both breadth (many scenarios) and depth (real tool behavior).

Tools We Taught:
- read_file - Actually read file contents
- search_files - Regex/pattern search across codebases
- find_definition - Locate classes/functions
- analyze_imports - Dependency tracking
- list_directory - Explore structure
- run_tests - Execute test suites

Improvements:
- Tool calling accuracy: 12% → 80%
- Correct parameters: 8% → 87%
- Multi-step tasks: 3% → 78%
- End-to-end completion: 5% → 80%
- Tools per task: 0.2 → 3.8

The LoRA really improves on intential tool call as an example consider the query: "Find ValueError in payment module"

The response proceeds as follows:

1. Calls search_files with pattern "ValueError"
2. Gets 4 matches across 3 files
3. Calls read_file on each match
4. Analyzes context
5. Reports: "Found 3 ValueError instances: payment/processor.py:47 for invalid amount, payment/validator.py:23 for unsupported currency..."

Resources:
- Colab notebook https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_3_Enhanced_Tool_Calling_and_Code_Understanding.ipynb
- Model - codelion/Llama-3.2-1B-Instruct-tool-calling-lora
- GitHub - https://github.com/codelion/ellora

reacted to danielhanchen's post with ❤️ 9 days ago

Post

4435

Run DeepSeek-V3.1 locally on 170GB RAM with Dynamic 1-bit GGUFs!🐋
GGUFs: unsloth/DeepSeek-V3.1-GGUF

The 715GB model gets reduced to 170GB (-80% size) by smartly quantizing layers.

The 1-bit GGUF passes all our code tests & we fixed the chat template for llama.cpp supported backends.

Guide: https://docs.unsloth.ai/basics/deepseek-v3.1

liked a model 10 days ago

deepseek-ai/DeepSeek-V3.1

Text Generation • 685B • Updated 7 days ago • 84.2k • • 684

liked a dataset 14 days ago

nvidia/Nemotron-Pretraining-Code-v1

Viewer • Updated 7 days ago • 881M • 3.05k • 26

liked a model 14 days ago

mradermacher/gpt-oss-20b-DAG-Reasoning-GGUF

21B • Updated 17 days ago • 3.71k • 1

reacted to Akhil-Theerthala's post with ❤️ 21 days ago

Post

2077

I'm excited to announce that I've just released the newest versions of my Kuvera models and the expanded Personal Finance Reasoning dataset on Hugging Face!

What's new:
I've expanded the Personal Finance Reasoning Dataset, which now includes 18.9k samples of real-world financial questions paired with detailed, empathetic answers. The previous generation pipeline was also streamlined with better psychological context and response validations.

I've also released new Kuvera models trained on this improved dataset:
- Kuvera-4B & 8B: These are my upgraded non-reasoning models, fine-tuned to provide practical financial advice. I've specifically trained the 8B model to better understand the user's emotional context.
- Kuvera-12B: A first experimental reasoning model focused on the query resolution.

As the sole person working on this project, this release is a noticeable step forward from my previous work, offering more powerful and nuanced tools for financial AI.

I am actively looking to collaborate with others who are passionate about analyzing and improving the quality of personal finance advice generated by large language models. If this sounds like you, please reach out!

You can check these out on the following links:

Models:
- Akhil-Theerthala/Kuvera-8B-qwen3-v0.2.1
- Akhil-Theerthala/Kuvera-4B-unsloth-gemma3
- Akhil-Theerthala/kuvera-12B-v0.2.0-unsloth-gemma3

Dataset:
- Akhil-Theerthala/Kuvera-PersonalFinance-V2.1

P.S. The paper on the framework used to generate these models along with the detailed evaluation of the main 8B model's responses is going to be released soon!

2 replies

posted an update 21 days ago

Post

422

We've brought DAG Reasoning to gpt-oss-20b and Qwen3-4B-Thinking-2507!

- DAG Reasoning is the first model in our Experimental Reasoning Modalities series: use it to create structured, analytical Directed Acyclic Graphs to provide insight into your queries and situations!
- Multi-step analysis identifies causal relationships, produces confidence measurements, and forms a single structured graph object.
- DAG Reasoning Format provides clear, readable JSON containing structured, useful information; easy to use for creating visualizations, doing analysis, or further conversation with your assistant.
- Trained in a variety of subjects for flexible analysis: programming, science, business, economics, finance, law, logistics, management, and more!

Our newest versions of DAG Reasoning are available now!
Get gpt-oss-20b: sequelbox/gpt-oss-20b-DAG-Reasoning
Get Qwen3-4B-Thinking-2507: sequelbox/Qwen3-4B-Thinking-2507-DAG-Reasoning

You can also get the DAG Reasoning dataset, to train your own models to use DAG Reasoning Format: sequelbox/DAG-Reasoning-DeepSeek-R1-0528

Support our experimental open-source research efforts, models and datasets: sequelbox/SupportOpenSource

Our upcoming releases, coming soon with your support:
- bringing Shining Valiant 3 to the Qwen 3 2507 series!
- our next release in the Experimental Reasoning Modalities series - we're hard at work on this right now!
- we'll be upgrading the Esper line with Esper 3.1 - newer and better datasets, asking tougher and deeper coding, DevOps, and architecture questions, plus improvements to general chat!

with love and appreciation,
allegra