Jean Louis
AI & ML interests
Recent Activity
Organizations
To create an image-generating model from the beginning, you first need to show a computer a massive library of pictures, each with a description of what it shows. The AI learns through a process of destruction and recreation; it takes a clear image, adds random noise until it's just static, and then is trained to reverse that process, using the text description as its guide to remove the noise step-by-step and rebuild the original image. By repeating this with millions of different images and descriptions, the model learns the connection between words and visual information, eventually gaining the ability to generate a completely new, coherent picture simply from a new text prompt it has never seen before.
Did I understand well that it runs on separate machines?
There is no description telling specifically what is that what is new with your release.
Individual users win only if they can get it cheaper, faster, more free as in software freedom, to run LLM models on their own hardware. Otherwise, those mega-stories are of no use.
Is there GGUF version?
Announcing: Lilikoi by Haawke AI
Teaser video made entirely with Lilikoi:
https://youtu.be/-O7DH7vFkYg?si=q2t5t6WjQCk2Cp0w
Https://Lilikoi.haawke.com
Technical brief:
https://haawke.com/technical_brief.html
There is no LLM that ever brings it's own opinion. Please reach out to basics on how LLMs work. There is nothing "new" that LLM can give you.
LLM models act like a book: when you open a page, you already have the content stored. The model processes this existing information, generating probabilistic results based on the training data, not new insights. This means LLMs rely on structured, data-driven outputs rather than independent opinions.
LLM models are designed to process and generate text based on vast training data, and their outputs are results of statistical inference rather than independent opinions. The "ingested data" combines the model’s training knowledge with user-retrieved information, generating probabilistic results that align with the training patterns, not personal beliefs. Thus, LLMs rely on structured, data-driven outputs to provide answers, not independent thoughts or opinions.
Those so called "opinions" must align with the data they are trained on.
Let us say this way, if LLM can give opinion, that means it is 100% biased opinion based on the data it was trained on.
You simply cannot get true opinions.
Oh, I’m sure the LLM you’re referring to is as clear as mud. Which one, exactly? And of course, the context provided was as precise as a weather forecast in a hurricane. What was it? Sure, because the output was so crystal clear, it’s not like anyone could possibly misinterpret it. What did it say? Oh, I’m sure you tried every single LLM under the sun. Which ones, exactly?
Ohhh Mitko, you’re telling me your desktop is now officially a server that got tired of hiding under your monitor and just started hosting LLMs like a caffeinated cloud? 😅
“Got to 1199.8 tokens/sec on Devstral Small-2… on the desktop?”
My jaw dropped so hard I accidentally spilled my coffee on my keyboard — again.
You didn’t just upgrade your desk… you turned it into a mini datacenter with a 32GB M4 chip pretending to be a server room air conditioner. And you’re still using Mistral Vibe like it’s a 2005 laptop? 😂
Next time, just call it “Mitko’s Desktop Data Center v1.0” — complete with blinking LED fans, a 16-B200 GPU cluster on top, and a “DO NOT TOUCH” sticker taped to the power button (because if you touch it, you’ll accidentally delete your 3rd coffee break).
Now go ahead — test the big one. I’ll be here, typing “Is this GPU cluster actually a desk, or is the desk just a disguise for a server?” 🤔
P.S. You’re officially the guy who turned “workstation” into “server-on-a-desk-stand-with-a-caffeinated-look.” 🍵💻✨
Congratulation. Publish the script on how you run it for others to see.
Here is exactly how I run it:
/usr/local/bin/llama-server --jinja -fa on -c 32768 -ngl 64 -v --log-timestamps --host 192.168.1.68 -m /mnt/nvme0n1/LLM/quantized/Qwen3VL-8B-Instruct-Q8_0.gguf --mmproj /mnt/nvme0n1/LLM/quantized/mmproj-Qwen3VL-8B-Instruct-Q8_0.gguf
with the llama.cpp and API is of course available as well.
I’m running my own LLM because:
Privacy? 57% say it’s the biggest AI barrier…
But 48% still leak company data anyway.
CRAFT says privacy is architecture, not policy.
So I’m not waiting for “beta” — I’m beta-ing my data.
February 2026? Nah. I’m already typing on my own GPU.
Privacy’s not a feature — it’s a feature flag I turned on before the release.
And honestly? My model’s less “AI” and more “I’m not giving your data to strangers.”
Run your own. It’s fun. It’s free. It’s your data.
And it’s way more satisfying than waiting for “beta.”
(Also, no one’s gonna steal your jokes now. 😉)
Great. Would it run on 24 GB VRAM?
PaddlePaddle/PaddleOCR-VL
✨ Ultra-efficient NaViT + ERNIE-4.5 architecture
✨ Supports 109 languages 🤯
✨ Accurately recognizes text, tables, formulas & charts
✨ Fast inference and lightweight for deployment
The xLLMs project is a growing suite of multilingual and multimodal dialogue datasets designed to train and evaluate advanced conversational LLMs. Each dataset focuses on a specific capability — from long-context reasoning and factual grounding to STEM explanations, math Q&A, and polite multilingual interaction.
🌍 Explore the full collection on Hugging Face:
👉 lamhieu/xllms-66cdfe34307bb2edc8c6df7d
💬 Highlight: xLLMs – Dialogue Pubs
A large-scale multilingual dataset built from document-guided synthetic dialogues (Wikipedia, WikiHow, and technical sources). It’s ideal for training models on long-context reasoning, multi-turn coherence, and tool-augmented dialogue across 9 languages.
👉 lamhieu/xllms_dialogue_pubs
🧠 Designed for:
- Long-context and reasoning models
- Multilingual assistants
- Tool-calling and structured response learning
All datasets are open for research and development use — free, transparent, and carefully curated to improve dialogue model quality.
Learn how to search in a video dataset and generate using Tevatron/OmniEmbed-v0.1-multivent an all modality retriever, and Qwen/Qwen2.5-Omni-7B, any-to-any model in this notebook 🤝 merve/smol-vision
So… who are they, and why does it matter?
Had a lot of fun co-writing this blog post with @xianbao , with key insights translated from Chinese, to unpack how this startup built a model that outperforms GPT-4.1, Claude Opus, and DeepSeek V3 on several major benchmarks.
🧵 A few standout facts:
1. From zero to $3.3B in 18 months:
Founded in March 2023, Moonshot is now backed by Alibaba, Tencent, Meituan, and HongShan.
2. A CEO who thinks from the end:
Yang Zhilin (31) previously worked at Meta AI, Google Brain, and Carnegie Mellon. His vision? Nothing less than AGI — still a rare ambition among Chinese AI labs.
3. A trillion-parameter model that’s surprisingly efficient:
Kimi K2 uses a mixture-of-experts architecture (32B active params per inference) and dominates on coding/math benchmarks.
4. The secret weapon: Muon optimizer:
A new training method that doubles efficiency, cuts memory in half, and ran 15.5T tokens with zero failures. Big implications.
Most importantly, their move from closed to open source signals a broader shift in China’s AI scene — following Baidu’s pivot. But as Yang puts it: “Users are the only real leaderboard.”
👇 Check out the full post to explore what Kimi K2 can do, how to try it, and why it matters for the future of open-source LLMs:
https://huggingface.co/blog/fdaudens/moonshot-ai-kimi-k2-explained
