Big news from CES — Cosmos Reason 2 is here — our most advanced reasoning vision-language model for physical AI, now topping the Physical AI Bench leaderboard🏆 shi-labs/physical-ai-bench-leaderboard
What’s new: - Enhanced physical reasoning & spatio-temporal understanding - Flexible deployment with 2B & 8B model sizes - Long-context understanding (up to 256K tokens) - Object detection with 2D/3D point localizations and trajectory data - New Cosmos Cookbook Recipes for faster onboarding
On top of Cosmos Reason 2, we also rolled out other new updates, including: - Cosmos Predict 2.5 – Unified Text2World/Image2World/Video2World model for higher-quality synthetic video worlds - Cosmos Transfer 2.5-2B – Lightweight, high-fidelity world-to-world translation with stronger physics alignment - NVIDIA GR00T N1.6 – Open robot foundation model for general-purpose robotic learning and control, integrated with Cosmos Reason
We can't build more private AI if we can't measure privacy intelligence.
That's why we're highlighting the Priv-IQ benchmark, a new, solution-oriented framework for evaluating LLMs on eight key privacy competencies, from visual privacy to knowledge of privacy law. The direct connection to our work is clear: the researchers relied on samples from the Ai4Privacy dataset to build out questions for Privacy Risk Assessment and Multilingual Entity Recognition.
This is the power of open-source collaboration. We provide the data building blocks, and researchers construct powerful new evaluation tools on top of them. It's a win-win for the entire ecosystem when we can all benefit from transparent, data-driven benchmarks that help push for better, safer AI.
Need a variable that's not fixed but depends on another request's response?
Runtime variables let you capture values from one API call and reuse them in subsequent requests.
What are Runtime Variables? Runtime variables are dynamic values that get set during request execution.
They're perfect for scenarios like: - Capturing an auth token from login and using it in authenticated requests - Storing a user ID from a create-user response - Saving an order ID to use in later order management calls
Skill Reflect: A Concept for Automated AI Skill Mastery
Let’s be real for a second: most of us are using AI all wrong. We send a prompt, get a "meh" answer, and then spend twenty minutes fixing it ourselves. That’s not a workflow; that’s just a digital chore. I wanted to see if I could push Claude further—to see if I could build a system that actually learns and refines itself. That’s how the Claude-Reflect-System (Skill Reflect) was born.
But here’s the thing: this isn’t some polished, final product. It’s a concept. It’s a blueprint. I’ve built the foundation of a recursive reflection loop that forces the AI to step back, look at its work, and act as its own harshest critic. It identifies the "skill delta"—the gap between "okay" and "mastery"—and closes it. This logic isn't just for Claude; you can grab this architecture and drop it right into codex-cli, terminal agents, or whatever stack you're building.
I’m a big believer in the law of causality. Action, reaction. Cause and effect. If you control the cause—the way the AI thinks about its mistakes—you dictate the effect: a perfected skill. This is a playground for builders who are tired of stochastic guessing. I want you to take this. Fork it. Break it. Make it better. This is an open invitation to the community to take this reflection loop and see how far we can push the boundaries of agentic reasoning. Whether you're building Claude Code plugins or just want to automate your self-learning, the code is there for you to smash. Stop accepting the first draft. Let’s build something that actually thinks.
Large language models and modern AI is often presented as technology that needs deep neural networks (DNNs) with billions of Blackbox parameters, expensive and time consuming training, along with GPU farms, yet prone to hallucinations. This book presents alternatives that rely on explainable AI, featuring new algorithms based on radically different technology with trustworthy, auditable, fast, accurate, secure, replicable Enterprise AI. Most of the material is proprietary and made from scratch, showcasing the culmination of decades of research away from standard models to establish a new framework in machine learning and AI technology.
I discuss an efficient DNN architecture based on a new type of universal functions in chapter 4, with DNN distillation and protection via watermarking in chapter 5. Then, in chapter 6, I discuss non-DNN alternatives that yield exact interpolation on the training set yet benefit from benign overfitting in any dimension. Accurate predictions are obtained with a simple closed-form expression, without gradient descent or other iterative optimization technique, essentially without training.
Case studies include 96% correct predictions for the next token on a Nvidia PDF repository, automated heart beat clustering and unusually high data compression rates (big data), anomaly detection and fraud litigation linked to large-scale cybsersecurity breach (large Excel repository, automated SQL, time series and geospatial data) as well as predicting next sequence on real-world genome data with home-made LLM technology. Some datasets with 1000 dimensions are generated with the best and fastest tabular data synthesizer on the market, described in details in chapter 2 along with the best model evaluation metric. These cases correspond to different agents linked to the xLLM technology (extreme LLM) developed by the author.
Domain-specific reasoning is crucial when working with big-budget campaigns on Meta. That's why we've launched an experimental Chain-of-Thought (CoT) reasoning model for critical thinking, tailored to Meta's Andromeda algorithm-based campaign structuring and optimization.
Domain-specific reasoning is crucial when working with big-budget campaigns on Meta. That's why we've launched an experimental Chain-of-Thought (CoT) reasoning model for critical thinking, tailored to Meta's Andromeda algorithm-based campaign structuring and optimization.