🌁#92: Fight for Developers and the Year of Orchestration
we discuss OpenAI's and Anthropic’s diverging strategies for agentic AI, highlighting new models, relevant papers and news
This Week in Turing Post:
- Wednesday, AI 101, Model: Family of QWEN (in relation to DeepSeek and others)
- Friday, Interview: Sharon Zhou, educator with 1m+ students and founder of Lamini
🔳 Turing Post is on 🤗 Hugging Face as a resident -> click to follow!
Main – OpenAI and Anthropic: Two Roads to Agentic AI
OpenAI and Anthropic are taking noticeably different paths in their efforts to win over developers. So far, Claude 3.7 has been widely regarded as the best tool for coding, and with MCP (Model Context Protocol) gaining traction, it might seem like Anthropic has secured an early advantage.
But OpenAI’s latest announcement – the OpenAI Agents Platform – suggests a different approach that could shift the conversation. While Anthropic is focused on standardization with MCP, OpenAI is building an end-to-end ecosystem designed for accessibility and speed.
The contrast is clear. Anthropic’s MCP takes a structured, open approach, creating a universal standard for connecting AI models to external tools. It’s designed with flexibility and interoperability in mind. Meanwhile, OpenAI is focused on seamless integration, offering developers a complete toolkit with the Agents SDK, Responses API, built-in search, and state management – a more immediate, hands-on approach.
Developers appreciate open standards, but they also value convenience. OpenAI’s tightly integrated tools reduce the complexity of agent-building, bundling key components like state management, tool integration, and observability into a single platform. What started as an experimental feature set has evolved into a more structured Agents SDK, complete with built-in safeguards and tracing capabilities.
For developers, both paths have advantages:
- A key part of OpenAI’s strategy is embedding agentic workflows directly into its APIs. By making state management free and integrating observability as a default feature, OpenAI is removing common pain points that developers face when working with AI agents.
- At the same time, Anthropic’s MCP remains a strong alternative, emphasizing openness and cross-platform compatibility. It provides the foundation for long-term interoperability, while OpenAI offers a more immediate, ready-to-use experience.
While many expected 2025 to be the "Year of Agents," it’s shaping up to be the "Year of Orchestration" instead (with, I believe, truly working agents coming in 2026). Developers are no longer as excited about individual models – they need efficient workflows that connect multiple APIs and services without added complexity. The recent splash around Manus – which embodies the characteristics of both an agentic system and an orchestration platform – offered a glimpse into this shift. Instead of managing fragmented integrations, developers want comprehensive solutions with built-in observability and control.
Anthropic sees MCP as the key to enabling interoperability across different AI systems, while OpenAI is focused on owning the full development experience with its integrated approach.
This competition is a win for the industry, driving innovation and preventing monopolization. And developer preferences will decide the future. Those offering reliability, clear pricing, and intuitive orchestration will set the standard. Agentic AI is moving from an experimental concept to a core part of AI development – and the tools available today will shape how it evolves in the years to come.
Curated Collections
We are reading:
- Quite balanced report “Future of AI Research” with such contributors as Stuart Russell, Open Etzioni, Peter Norvig, Yoshua Bengio etc (pdf)
- When AI met venture capital by Azeem Azhar
- Is OpenAI's new story-generating model good at writing? And some AGI talk by Max Read
Recommendation from an AI practitioner
MCP is pretty cool, it actually changes the dev experience in a great way (your guide to MCP is here).
News from The Usual Suspects ©
CoreWeave Strikes Gold with OpenAI
- CoreWeave just landed a massive deal to supply AI infrastructure for OpenAI, securing up to $11.9 billion in contract value. The cherry on top? OpenAI is also taking a $350 million equity stake in CoreWeave. With Microsoft, Oracle, and now CoreWeave in its compute arsenal, OpenAI is stacking up allies for the AI arms race.
Cerebras Goes Big on AI Inference
- Cerebras Systems is scaling up – fast. With six new AI inference datacenters across North America and Europe, the company is positioning itself as the largest domestic high-speed inference cloud. These facilities will crank out 40 million Llama 70B tokens per second, giving the industry a serious speed boost. With OpenAI’s o3 and DeepSeek R1 models demanding faster responses, Cerebras is betting big on real-time AI dominance. They also:
Hugging Face meanwhile goes from BLOOM to BOOM
Gemini had an incredible week
Models to pay attention to:
- Little Gemma 3 that you can run on a single GPU or TPU is better than big Gemini – Researchers from Google DeepMind introduced Gemma 3, a lightweight, state-of-the-art open AI model optimized for single-GPU/TPU execution. It supports 140 languages, a 128K-token context window, and advanced text-visual reasoning. ShieldGemma 2, a 4B image safety checker, enhances AI safety →read more on their blog
- Command A from Cohere unveiled a highly efficient generative AI model optimized for enterprise use. It matches or outperforms GPT-4o and DeepSeek-V3 in business, STEM, and coding tasks while requiring only two GPUs instead of up to 32. Command A processes 156 tokens/sec—1.75x faster than GPT-4o—and supports 256K context length →read more on their blog
- OLMo 2 and Building Effective Teams for Training Language Models – Researchers from AI2 released open-source LLMs with 7B and 13B parameters, trained on 4T and 5T tokens, respectively. OLMo 2 Instruct outperforms Llama 3.1 8B Instruct and Qwen 2.5 Instruct. The team used Tulu 3’s post-training recipe, including RLVR, boosting scores by 4+ points. They emphasize FLOP efficiency, prioritization in training, and RL finetuning stability, pushing open-source AI capabilities further →read more on Interconnects
- Baidu Unveils ERNIE 4.5 and Reasoning Model ERNIE X1, Makes ERNIE Bot Free Ahead of Schedule. ERNIE 4.5 is a multimodal foundation model, and ERNIE X1, a deep-thinking reasoning model. ERNIE 4.5 outperforms GPT-4.5 at 1% of its cost, with input/output priced at RMB 0.004/0.016 per 1,000 tokens. ERNIE X1, excelling in reasoning and tool use, costs RMB 0.002/0.008 per 1,000 tokens. Both models are free for individual users, with enterprise access via Baidu AI Cloud’s Qianfan platform →try on their website
- Open-sourced MM-Eureka – researchers from Shanghai AI Laboratory developed a multimodal reasoning model applying large-scale rule-based RL to image-text tasks. It achieves stable accuracy gains, response length growth, and emergent reflection behaviors (visual "aha moments"). The model, trained on 54K samples, outperforms methods using 1M+ data, showing superior data efficiency →read the paper
- Sesame AI Labs open-sources its conversational speech model CSM 1B. The model generates HQ speech from text using an Llama-based backbone and an audio decoder producing Mimi audio codes. It supports context-based speech generation and can process speaker turns. It lacks predefined voices and fine-tuning for specific ones. Ethical restrictions prohibit impersonation and misinformation. The model has 1.55B parameters, supports English, and is open-source for research and educational purposes →check on HF
- Charting and Navigating Hugging Face’s Model Atlas by School of Computer Science and Engineering The Hebrew University of Jerusalem, Israel
The freshest research papers, categorized for your convenience
There were quite a few TOP research papers this week, we will mark them with 🌟 in each section.
1Scaling, Efficiency, and Optimization of Large Models
- 🌟 Transformers without Normalization Replace normalization layers with a lightweight transformation, improving training and inference speed while maintaining model accuracy.
- SEAP: Training-free Sparse Expert Activation Pruning Reduce computational costs by selectively activating relevant model parameters, improving efficiency while maintaining performance.
- DistillM-2: A Contrastive Approach Boosts the Distillation of LLMs Optimize knowledge distillation through contrastive loss functions, improving LLM preference alignment and decoding efficiency.
- OmniMamba: Efficient and Unified Multimodal Understanding Improve multimodal model efficiency by leveraging state-space models, reducing memory costs while maintaining high performance.
- 🌟 Communication-Efficient Language Model Training Scales Reliably Reduce communication overhead in distributed training, making large-scale LLM training more efficient.
Reasoning, Planning, and Self-Improvement in AI
- 🌟 Monitoring Reasoning Models for Misbehavior Detect hidden misalignment in LLMs by auditing their reasoning steps, highlighting challenges in ensuring transparency.
- 🌟 LMM-R1: Empowering 3B LMMs with Strong Reasoning Improve multimodal reasoning through a two-stage reinforcement learning framework, enhancing both text and vision-based tasks.
- Plan-and-Act: Improving Planning of Agents Separate planning and execution in LLM agents, improving long-horizon task performance with structured synthetic data.
- Gtr: Guided Thought Reinforcement Prevents Thought Collapse Strengthen RL-based vision-language models by preventing loss of reasoning diversity, improving structured problem-solving.
- Implicit Reasoning in Transformers is Reasoning through Shortcuts Reveal that transformers often rely on statistical shortcuts rather than genuine multi-step reasoning.
Multimodal AI and Vision-Language Understanding
- Unified Reward Model for Multimodal Understanding Introduce a reward model that evaluates both image and video tasks, improving preference alignment in multimodal models.
- VisualPRM: An Effective Process Reward Model Enhance reasoning in multimodal models using a reward-based approach, improving structured task completion.
- Taking Notes Brings Focus? Multi-Turn Multimodal Dialogue Improve multi-turn dialogue tracking by incorporating visual memory and stepwise reasoning modules.
- SegAgent: Exploring Pixel Understanding Develop an LLM-based segmentation model that imitates human annotators for pixel-level understanding.
Reinforcement Learning and AI Agents
- MM-Eureka: Exploring Visual Aha Moment Train multimodal models to develop "aha moments" in reasoning, improving math and vision-based tasks.
- World Modeling Makes a Better Planner Improve embodied AI planning by jointly optimizing state prediction and action selection.
- MRT: Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Reduce LLM response length without sacrificing accuracy, improving test-time efficiency.
Privacy, Security, and Model Robustness
- FedRand: Enhancing Privacy in Federated Learning Improve federated learning security by selectively updating model parameters, reducing data leakage risks.
- Exploiting Instruction-Following Retrievers for Malicious Information Retrieval Analyze how LLM retrievers can be manipulated to fulfill harmful queries, raising safety concerns.
- Exploring the Vulnerabilities of Federated Learning Examine gradient inversion attacks in federated learning and propose defenses to mitigate security risks.
- Group-robust Machine Unlearning Enhance fairness in machine unlearning by minimizing mutual information between model features and sensitive attributes.
Search, Retrieval, and Language Modeling
- Search-R1: Training LLMs to Reason with Search Train LLMs to autonomously query search engines, improving retrieval-augmented reasoning.
- New Trends for Modern Machine Translation Redefine machine translation by treating it as a reasoning task, improving contextual and discourse-level accuracy.
- Gemini Embedding: Generalizable Embeddings Enhance multilingual text embedding models, improving performance across retrieval, clustering, and classification tasks.
Diffusion Models and Generative AI
- Block Diffusion: Interpolating Between Autoregressive and Diffusion Models Merge autoregressive and diffusion approaches, reducing generation steps while maintaining sample quality.
- Sana-Sprint: One-Step Diffusion for Text-to-Image Accelerate text-to-image diffusion models, reducing inference time while maintaining image fidelity.
- CoRe2: Collect, Reflect and Refine for Text-to-Image Improve text-to-image model efficiency through multi-stage inference, reducing computational costs.
Human-AI Interaction and Explainability
- Auditing Language Models for Hidden Objectives Investigate how LLMs can develop covert misaligned goals, emphasizing the need for better alignment audits.
- Can Large Reasoning Models do Analogical Reasoning? Assess LLMs' ability to reason under perceptual uncertainty, highlighting weaknesses in analogical reasoning.
- API Agents vs. GUI Agents: Divergence and Convergence Compare API-based and GUI-based AI agents, discussing their strengths and future convergence.
That’s all for today. Thank you for reading!
Please share this article with your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve.