John6666 (John Smith)

reacted to openfree's post with 🚀🤗 2 days ago

Post

417

🔄 GitHub ↔️ HuggingFace Bidirectional Repository Converter + AI Auto Interface Generation

🎯 Three Magic Features in One Click
Move GitHub repositories to HuggingFace Spaces and Spaces back to GitHub freely, while AI automatically creates web interfaces for you.

openfree/Github-Transfer

🧠 AI Perfectly Understands Your Code
Project DNA Analysis
When you upload a repository, AI scans the entire structure. It comprehensively analyzes dependencies in requirements.txt, descriptions in README, code patterns, and model files to understand the essence of your project.
Auto-Generated Custom Interface
Computer vision projects transform into image upload and visualization interfaces, NLP projects into text input and generation options, audio projects into recording and waveform displays. Not templates, but optimized custom UI for your project.
LLM Fine-tunes Every Detail
Fireworks AI's large language model carefully configures parameter slider ranges, input validation, error handling, and result formatting. The demo looks as polished as if an experienced developer created it.
🔄 The Power of Bidirectional Conversion
GitHub to HuggingFace
A model completed on GitHub in the morning becomes a live web demo shared at lunch. AI automatically generates the Gradio interface, eliminating the need for separate frontend development.
HuggingFace to GitHub
Code experimented and improved in Spaces exports to GitHub for version control in the evening. Collaborate with team members through PRs and connect to CI/CD pipelines.

💡 Experience the Real Conversion Magic
🚀 AI Smartly Resolves Dependency Conflicts
✨ Real Automation That Saves Developer Time

🎬 You develop, AI deploys and demos!
Move freely between GitHub and HuggingFace, showcasing your projects with perfect AI-generated interfaces 🚀

reacted to Jaward's post with 🚀 2 days ago

Post

6333

It’s absolutely mind blowing - the work Dynamics Lab is doing!!
With just a single input image and in a few seconds, their new world engine model (Mirage 2) can generate a whole new interactive world that’s physics informed and fully explorable in real-time🤯
Try it yourself: https://demo.dynamicslab.ai/chaos

1 reply

·

reacted to dhruv3006's post with 🔥 2 days ago

Post

1746

Human in the Loop for computer use agents (instant handoff from AI to you)

Sometimes the best “agent” is you.

We’re introducing Human in the Loop: instantly hand off from automation to human control when a task needs judgment.

Yesterday we shared our HUD evals for measuring agents at scale. Today you can become the agent when it matters take over the same session see what the agent sees and keep the workflow moving.

Lets you create clean training demos, establish ground truth for tricky cases, intervene on edge cases ( CAPTCHAs, ambiguous UIs) or step through debug without context switching.

You have full human control when you want.We even a fallback version where in it starts automated but escalate to a human only when needed.

Works across common stacks (OpenAI, Anthropic, Hugging Face) and with our Composite Agents. Same tools, same environment take control when needed.

Feedback welcome,curious how you’d use this in your workflows.

Blog : https://www.trycua.com/blog/human-in-the-loop.md

Github : https://github.com/trycua/cua

reacted to Reubencf's post with 👍 2 days ago

Post

226

📖✨ Exciting News!

I’ve released the Konkani Bible – New Testament Translation along with the English version side by side.
You can now read, compare, and explore the scriptures in both languages.

Check it out here: 👉
Reubencf/konkani_bible

Dataset: Reubencf/Konkani_bible

reacted to kanaria007's post with 👀 2 days ago

Post

211

✅ New Article: *Jurisprudence as Recursive Justice*

*Title:
⚖️ Justice: Recursive Ethics under Public Parse Guard
🔗 https://huggingface.co/blog/kanaria007/structured-jurisprudence

---

Summary:
Jurisprudence is more than *law and rules* —
it is *the architecture of reasoning about justice*.

Structured Intelligence reframes it as *recursive constraint resolution*:

* Principles as *root nodes in ethical trees*
* Cases as *looped tests of structural coherence*
* Judgment as *auditable traversal of protocol space*

> Justice isn’t an opinion —
> *it’s structure proving itself across cases.*

---

Why It Matters:
• Explains *how legal principles persist and adapt over time*
• Supports *AI that can model reasoning without black‑box verdicts*
• Bridges *philosophy of law, ethics, and cognitive architecture*

---

What’s Inside:
• Jurisprudence as *recursive justification engine*
• *Precedent and principle loops* in reasoning
• *Rollback and ethical rebinding* in evolving law
• Implications for *AI legal assistance and transparent governance*

---

📖 Article 32 of the Structured Intelligence Series

Where Article 31 explored *martial arts as embodied structure*,
Article 32 frames *jurisprudence as recursive justice* —
showing how *law proves itself through structural traversal*.

---

Next: History as Structured Memory
The next article explores *history as recursive loop architecture*,
revealing *civilizations as systems that selectively remember and forget*.

> From law to legacy,
> *structure decides what persists across time.*

reacted to codelion's post with 🔥 2 days ago

Post

5869

I recently worked on a LoRA that improves tool use in LLM. Thought the approach might interest folks here.

The issue I have had when trying to use some of the local LLMs with coding agents is this:

Me: "Find all API endpoints with authentication in this codebase"
LLM: "You should look for @app .route decorators and check if they have auth middleware..."

But I often want it to search the files and show me but the LLM doesn't trigger a tool use call.

To fine-tune it for tool use I combined two data sources:

1. Magpie scenarios - 5000+ diverse tasks (bug hunting, refactoring, security audits)
2. Real execution - Ran these on actual repos (FastAPI, Django, React) to get authentic tool responses

This ensures the model learns both breadth (many scenarios) and depth (real tool behavior).

Tools We Taught:
- read_file - Actually read file contents
- search_files - Regex/pattern search across codebases
- find_definition - Locate classes/functions
- analyze_imports - Dependency tracking
- list_directory - Explore structure
- run_tests - Execute test suites

Improvements:
- Tool calling accuracy: 12% → 80%
- Correct parameters: 8% → 87%
- Multi-step tasks: 3% → 78%
- End-to-end completion: 5% → 80%
- Tools per task: 0.2 → 3.8

The LoRA really improves on intential tool call as an example consider the query: "Find ValueError in payment module"

The response proceeds as follows:

1. Calls search_files with pattern "ValueError"
2. Gets 4 matches across 3 files
3. Calls read_file on each match
4. Analyzes context
5. Reports: "Found 3 ValueError instances: payment/processor.py:47 for invalid amount, payment/validator.py:23 for unsupported currency..."

Resources:
- Colab notebook https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_3_Enhanced_Tool_Calling_and_Code_Understanding.ipynb
- Model - codelion/Llama-3.2-1B-Instruct-tool-calling-lora
- GitHub - https://github.com/codelion/ellora

reacted to MonsterMMORPG's post with 👀 2 days ago

Post

2211

Huge updates made for SECourses Musubi Tuner - 1-Click to Install App for LoRA Training and Full Fine Tuning Qwen Image, Qwen Image Edit, Wan 2.1 and Wan 2.2 Models with Musubi Tuner with Ready Presets

1-Click to install app link : https://www.patreon.com/posts/137551634

Check all the screenshots

30 August 2025 Update V7
Dataset TOML file generate error fixed

Qwen2.5-VL image captioning turns out working perfect on Windows

It turns out my model file was corrupted even though it was same size

Therefore I have updated the model downloader and now it will check and verify SHA 256 of files therefore it will be 100% accurate

Prompt file selection folder icon issue fixed

Downloader file will use generated venv of installation

Make sure to run it after installation completed

Fixed skip existing captions functionality in Image Captioning with Qwen2.5-VL

Previously skipping was happening after caption generation which was destroying the skip logic

Now properly checks for existing captions before processing, significantly improving efficiency

Added full batch captioning status display in command line with progress tracking and ETA

Enhanced config save/load functionality for better reliability

Improved interface of Image Captioning with Qwen2.5-VL for better user experience

Various error fixes in the Qwen2.5-VL captioning pipeline

Fixed broken config save and load functionality for Optimizer Arguments and Scheduler Arguments

Improved Stop Training button responsiveness - now appears much earlier when Text Encoder caching starts

Enhanced training control for better user experience

A new full dedicated section for Sample generation implemented

It will automatically format your given sample txt file with the settings you set on GUI

So you just type prompts into txt file with new lines e.g.

ohwx man wearing a very nice amazing suit

ohwx man driving a luxury car

1 reply

·

reacted to kanaria007's post with 👀 2 days ago

Post

262

✅ New Article: *Martial Arts as Embodied Structure*

Title:
🥋 Martial Arts: Constraint Execution through Embodied Jump Control
🔗 https://huggingface.co/blog/kanaria007/structured-martial-arts

---

Summary:
Martial arts are often seen as *combat skills or tradition*.
Structured Intelligence reframes them as *embodied structural intelligence*:

* *Movement as live jump execution under constraint*
* *Stance and flow as self‑stabilizing loops*
* *Discipline as recursive rollback and ethical alignment*

> Martial practice isn’t about fighting —
> *it’s structure moving through the body.*

---

Why It Matters:
• Reveals *how physical training encodes decision and ethics*
• Connects *embodiment, cognition, and structural learning*
• Enables *AI and robotics to integrate motion as reasoning*

---

What’s Inside:
• Martial arts as *jump‑and‑stabilize architecture*
• *Timing, flow, and rollback* as structural control
• *Discipline and etiquette* as ethical scaffolds
• Implications for *embodied AI, sports, and cognitive resilience*

---

📖 Article 31 of the Structured Intelligence Series

Where Article 30 explored *nonverbal thought as parallel structure*,
Article 31 shows *martial arts as embodied structure* —
revealing *motion as live cognition under constraint*.

---

Next: Jurisprudence as Recursive Justice
The next article explores *how law and judgment operate as recursive constraint resolution*,
revealing *justice as structural coherence under shared ethics*.

> From motion to judgment,
> *structure carries both the body and the law.*

reacted to AdinaY's post with 🔥 2 days ago

Post

442

USO 🎨 Unified customization model released by Bytedance research

Demo
bytedance-research/USO
Model
bytedance-research/USO
Paper
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning (2508.18966)

✨ Large-scale triplet dataset (content, style, stylized)
✨ Disentangled learning: style alignment + content preservation
✨ Style Reward Learning (SRL) for higher fidelity
✨ USO-Bench: 1st benchmark for style & subject jointly
✨ SOTA results on subject consistency & style similarity

reacted to AdinaY's post with 🔥 2 days ago

Post

345

Step-Audio 2🔥 New end to end multimodal LLM for audio & speech, released by StepFun

stepfun-ai/step-audio-2-68b003c3a47b273fffaf67a8

✨ Direct raw audio: text & speech ,no ASR+LLM+TTS pipeline
✨ High-IQ reasoning: RL + CoT for paralinguistic cues
✨ Multimodal RAG + tool calling
✨ Emotion, timbre, dialect & style control
✨ SOTA on ASR, paralinguistic, speech dialog

reacted to prithivMLmods's post with ❤️ 2 days ago

Post

3292

Introducing prithivMLmods/DeepCaption-VLA-7B, a multimodal VLM designed for reasoning with long-shot captions (Captioning and Vision-Language Attribution). It focuses on defining visual properties, object attributes, and scene details across a wide spectrum of images and aspect ratios, generating attribute-rich image captions. The model supports creative, artistic, and technical applications that require detailed descriptions. 🤗🔥

✦︎ Models: prithivMLmods/DeepCaption-VLA-7B, also includes prithivMLmods/DeepAttriCap-VLA-3B, an experimental model for vision-language attribution.

✦︎ Try the demo here: prithivMLmods/VisionScope-R2

✦︎ Try it now on Google Colab, with support for T4 GPUs in 4-bit quant_type: https://github.com/PRITHIVSAKTHIUR/Multimodal-Outpost-Notebooks/blob/main/DeepCaption-VLA-7B%5B4bit%20-%20notebook%20demo%5D/DeepCaption-VLA-7B.ipynb

✦︎ Collection: prithivMLmods/deepcaption-attr-68b041172ebcb867e45c556a

.
.
.

To know more about it, visit the model card of the respective model. !!

2 replies

·

reacted to tsungyi's post with 🔥 4 days ago

Post

1944

Cosmos Reason just topped Physical Reasoning Leaderboard on Hugging Face. 👏🔥

Cosmos Reason is an open, customizable, commercial-ready 7B-parameter, reasoning vision language model (VLM) for physical AI and robotics. The VLM empowers robots and vision AI agents to reason like humans, leveraging prior knowledge, physics understanding, and common sense to understand and operate intelligently in the real world.

This model unlocks advanced capabilities for robotics, autonomous vehicles, and real-world operations—from cities to high-tech factories.

Key use cases include:
Data curation & annotation: Automate high-quality dataset curation and annotation at scale.
Robot planning & reasoning: Serve as the "brain" for deliberate, methodical decision-making with vision language action (VLA) models.
Video analytics AI agents: Extract actionable insights and perform root-cause analysis on massive video datasets.

Ready to build the next generation of physical AI? Get started 👉 nvidia/Cosmos-Reason1-7B
Try the preview here: https://build.nvidia.com/nvidia/cosmos-reason1-7b

reacted to amirgame197's post with 👍 4 days ago

Post

290

I realized i have github and i was not really posting anything interesting there so its been a few days i started uploading my (kinda) useful projects. some of them are going to be ai related too, so check out my profile if you like!
https://github.com/amirgame197

At the end, its your smile that is appreciated.

1 reply

·

reacted to MonsterMMORPG's post with 👀 4 days ago

Post

2163

Nano Banana (Gemini 2.5 Flash Image) Full Tutorial — 27 Unique Cases vs Qwen Image Edit — Free 2 Use : https://youtu.be/qPUreQxB8zQ

Tutorial link : https://youtu.be/qPUreQxB8zQ

Nano Banana AI image editing model was published by Google today. It is officially named the Google Gemini 2.5 Flash Image model. It is the most advanced zero-shot image editing model ever made. I have conducted a thorough, in-depth review of this model with 27 unique cases. All prompts, images used, and results are demonstrated in real-time—live in this tutorial. Moreover, I have compared each result with the state-of-the-art (SOTA) best open-source, locally available, and free-to-use Qwen Image Edit model, so we can see which model performs better at which tasks.

Video Chapters

0:00 Introduction to Google's "Nano Banana" (Gemini 2.5 Flash)
0:28 Comparing Gemini vs. Qwen Image Edit Model (27 Test Cases)
1:33 Solving Gemini's Low Resolution with SUPIR Upscaling
2:28 Teaser: Upcoming Qwen Image LoRA Training Application
2:41 How to Access Gemini 2.5 Flash in Google AI Studio
2:55 Test Case 1: Text Conversion
3:31 Test Case 2: Photorealism Test (Portrait)
4:36 Test Case 3: Adding Sunglasses
5:44 Test Case 4: Adding Iron Man to a Surfer (Gemini Wins)
6:38 Test Case 5: Adding a Cat (Qwen Wins)
7:20 Test Case 6: Clothing Extraction (Gemini Fails)
8:02 Test Case 7: Character Back View (Qwen Wins on Accuracy)
9:24 Test Case 8: Photo to Anime Style (Gemini Wins on Resemblance)
10:18 Test Case 9: Changing Background to Night
11:37 Test Case 10: Outpainting a Portrait (Qwen Wins on Proportions)
13:22 Test Case 11: Adding a Lion to a Scene (Gemini Wins)
13:59 Test Cases 12 & 13: Stylization Failures (Pixel Art & Claymation)
15:44 Test Case 14: Adding a Knight's Helmet
16:47 Test Case 15: Adding Reflections (Qwen is More Accurate)
18:00 Test Case 16: Changing Day to Night (Window View)
19:33 Test Case 17: Adding a Wooden Sign
20:22 Test Case 18: Old Photo Restoration

2 replies

·

reacted to codelion's post with 🔥 4 days ago

Post

5034

I wanted to share a technique that's been working really well for recovering performance after INT4 quantization.

Typically, quantizing the LLM to INT4 (unlike say INT8) for inference can incur some accuracy loss. Instead of accepting the quality loss, we used the FP16 model as a teacher to train a tiny LoRA adapter (rank=16) for the quantized model. The cool part: the model generates its own training data using the Magpie technique so no external datasets needed. This is critical because we want to remain as much as possible in the distribution of the model's natural responses.

Last year Apple's foundational models paper (https://arxiv.org/pdf/2407.21075) had proposed a similar technique and found "By using accuracy-recovery LoRA adapters with only rank 16, Alpaca win rate can be improved by 7-18%, GMS8K accuracy is boosted by 5-10%." (page 47).

We saw similar results on Qwen3-0.6B:

Perplexity: 2.40 → 2.09 (only 5.7% degradation from FP16 baseline)
Memory: Only 0.28GB vs 1.0GB for FP16 (75% reduction)
Speed: 3.0x faster inference than FP16
Quality: Generates correct, optimized code solutions

- Pre-trained adapter: codelion/Qwen3-0.6B-accuracy-recovery-lora
- GitHub repo: https://github.com/codelion/ellora

Happy to answer questions about the implementation or help anyone trying to replicate this. The key insight is that quantization errors are systematic and learnable - a small adapter can bridge the gap without negating the benefits of quantization.

Has anyone else experimented with self-distillation for quantization recovery? Would love to hear about different approaches!

reacted to mrs83's post with 🔥 4 days ago

Post

323

Hello HF community, I'm happy to share a project I've been working on that combines mlx-lm with Flower, to enable federated fine-tuning of SLMs (Small Language Models) on MacOS devices 

GitHub Repo: https://github.com/ethicalabs-ai/BlossomTuneLLM-MLX

By combining mlx-lm with a federated learning framework like Flower (https://flower.ai/), we can leverage the hardware people already own and reduce the reliance on expensive GPUs, enabling collaborative model training.

This project is the MLX-native evolution of an earlier codebase for FlowerTune LLM:

https://arxiv.org/abs/2506.02961
https://flower.ai/blog/2024-10-16-flowertune-llm-leaderboard
https://github.com/ethicalabs-ai/BlossomTuneLLM

How it works:

Flower handles all the federated learning logic.
A central server (superlink) coordinates the training rounds, client selection, and parameter aggregation.
Each participant in the network runs a Flower client (supernode) on their Mac. In each round, the client:
- Receives the global LoRA/DoRA adapter weights from the server.
- Loads its local data partition.
- It makes use of the mlx-lm programmatic API (mlx_lm.tuner.train) to perform LoRA/DoRA fine-tuning.
- Sends only the updated adapter weights back to the server.

The server only ever sees the aggregated model updates and private data never leaves the device.

Flower made it easy to run a full simulation (with a centralized HF dataset, partitioned using flower-datasets) on a single machine or multiple machines, to test the whole process in action and experiment further.

All you need is a single or multiple Mac machines with Apple Silicon 

reacted to dylanebert's post with 🚀 4 days ago

Post

2620

These are the current best Generative 3D

Render:
#1 - CSM
#2 - TRELLIS (open-source)
#3 - Zaohaowu3D

Topology:
#1 - Hunyuan3D-2
#2 - TRELLIS (open-source)
#3 - Hunyuan3D-2.1

as voted/submitted openly on dylanebert/3d-arena

reacted to salma-remyx's post with 🚀 4 days ago

Post

2704

Are you coming to SF this Fall?

Next week, we'll be at the AI Agent Builders Summit.
And in late October, GitHub Universe, ODSC West, and Experiment 2025.

We're sharing what we've learned while building agents to help you test new research ideas out of the arXiv into PRs for your repo.

This Summer, we've analyzed thousands of papers, ranking each for relevance to our work before building hundreds of Docker images and opening hundreds of PRs for our repos.

Read more about PapersWithPRs: https://www.reddit.com/r/LocalLLaMA/comments/1mq7715/paperswithprs_dont_just_read_the_paper_replicate/

𝗔𝗜 𝗔𝗴𝗲𝗻𝘁 𝗕𝘂𝗶𝗹𝗱𝗲𝗿𝘀 𝗦𝘂𝗺𝗺𝗶𝘁: https://luma.com/agents-world-tour-sf
𝗚𝗶𝘁𝗛𝘂𝗯 𝗨𝗻𝗶𝘃𝗲𝗿𝘀𝗲: https://githubuniverse.com/
DISCOUNT CODE: TAKEMETOUNIVERSE
𝗘𝘅𝗽𝗲𝗿𝗶𝗺𝗲𝗻𝘁 𝟮𝟬𝟮𝟱: https://luma.com/145xyuyw

reacted to mitkox's post with 🚀 4 days ago

Post

224

Hermes4 70B synthetic dataset generation on my desktop Z8 GPU rig:
307 tok/sec
1.1M tok/hour

The bottleneck for generating massive, high-quality reinforcement learning datasets is never the GPU compute; it's always the model's willingness to actually answer the darn question.

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John Smith PRO

AI & ML interests

Recent Activity

Organizations

John6666's activity