Bridging the Gap: Making Robotics Feel Like Machine Learning

Community Article Published August 12, 2025

Imagine training a cutting-edge neural network on your laptop in under an hour, running it on cloud GPUs the next day, and shipping it to millions by week’s end. This is the world machine learning lives in—a frictionless pipeline of tools like PyTorch, Hugging Face Transformers, and OpenAI Gym that make experimentation fast, modular, and intuitive.

Now, try doing the same in robotics! 😱😱😱

Suddenly, you're knee-deep in C++, deciphering driver documentation from 2014, writing ROS launch files, and praying your code behaves the same on the real robot as it did in simulation. Want to switch out your sensor or retrain a policy? Prepare for hours—if not days—of integration and debugging.😟😟😟

Robotics, despite fabulous hardware progress—from humanoid kickboxing to drone ballets—lags painfully behind in software. The AI community has GitHub, pip, and plug-and-play; robotics still has Makefiles, mismatched protocols, and “it worked on that robot.” This disconnect has stifled rapid innovation, making robotic autonomy feel perpetually stuck in the lab.

We built Ark to change that 😊😊😊

🕰️ From Shakey to ROS: A Brief History of Robotics Software

Robotics has always pushed the limits of what's possible. Back in the 1960s, systems like Shakey—the first robot to reason about its actions—used symbolic planning and LISP-based control to navigate rooms and push boxes. It was brilliant, but it was also ... brittle. Every new task required bespoke logic. Every new sensor meant rewriting large chunks of code.

Then came the 1990s and 2000s, when middleware like Player/Stage and eventually ROS (Robot Operating System) tried to standardise things. ROS was transformative: it gave researchers a way to abstract communication, simulate robots, and share packages. But it wasn’t built with learning in mind. ROS made talking to sensors easier, but training a policy still felt like stitching together two different worlds.

Meanwhile, machine learning was evolving differently. Python-first libraries like Scikit-learn, TensorFlow, PyTorch, and OpenAI Gym embraced simplicity, reproducibility, and rapid iteration. Newcomers could spin up experiments with a few lines of code. Researchers could share models and results with ease. AI flourished.

Robotics, in contrast, remained a patchwork of C++, YAML, low-level drivers, and fragile launch sequences. The result? A robot demo that works for a conference video, but takes months to replicate or scale.

It’s time for robotics to catch up. ☝️☝️

🔗 Why Ark is the Next Step

If ROS was the operating system that connected hardware, Ark is the framework that connects ideas — from simulation to reality, from data to deployment, from code to policy. We designed Ark with a clear philosophy: robotics software should be as fluid, modular, and Pythonic as machine learning. That means:

✅ A Gym-style interface for defining robotic environments, so ML researchers feel at home.
✅ Native support for data collection, preprocessing, and training with state-of-the-art imitation learning methods like ACT and Diffusion Policy.
✅ A seamless sim-to-real switch, so deploying on real hardware is as simple as toggling a config flag.
✅ A modular node-based architecture with a publish–subscribe model, allowing distributed real-time control while staying flexible and introspectable.
✅ Full ROS interoperability — but only if you need it. Ark doesn’t depend on ROS. It just plays nicely with it.

Under the hood, Ark uses Lightweight Communications and Marshalling (LCM) for efficient message passing and includes tools for visualisation, debugging, and logging. Whether you're replaying demonstrations, training a policy, or switching camera streams, it all happens through the same Python-first interface.

Crucially, Ark isn’t tied to a single robot embodiment. You can swap between arms, mobile bases, or humanoids without rewriting the system. That’s because we built Ark not just for a robot, but for robotics research itself, as it increasingly merges with foundation models, multimodal learning, and large-scale simulation.

👀 How to Best See Ark?

There are many ways to think about Ark, but perhaps the most natural is to see it as a real-world extension of OpenAI Gym—a framework that revolutionised reinforcement learning around 2016 by standardising simulated environments. Where Gym brought consistency and accessibility to simulation, Ark brings that same philosophy to the physical world, enabling seamless integration between real hardware and machine learning systems.

To actually make this seamless sim-to-real switching work, we had to build a distributed simulator that mirrors the architecture of real-world systems. Every component—sensors, actuators, policies—runs as a networked process using a publisher-subscriber model. This decouples simulation logic from hardware-specific drivers and allows everything to scale and interoperate, whether it's running in a physics engine or on a physical robot. But how does this work in practice?

Publisher-Subscriber: Under the hood, Ark is built around a publisher-subscriber messaging model. Every component—camera, robot, controller, even your ML policy—runs as a standalone process (called a node), and talks to others by publishing or subscribing to named message channels. For example, we might have:

{
"camera_node"  → publishes → /camera/rgb
"joint_sensor" → publishes → /robot/joint_state
"policy_node"  → subscribes to both above, and publishes → /robot/velocity_command
"controller"   → subscribes to → /robot/velocity_command
}

Each of these runs independently, and Ark’s backend (based on LCM) makes sure the messages arrive where they’re needed. The beauty of this setup is that you can run exactly the same system in simulation or on real hardware—just by swapping the publishers. A simulated camera node publishes to /camera/rgb in the same format as a physical RealSense device would. Nothing else needs to change. You can define all of this in a single YAML config file. Then, run:

"ark_launch" my_robot_config.yaml

One of the most interesting things about ARK is how familiar it feels if you come from a machine learning background. Like PyTorch or Gym, ARK environments expose simple reset() and step() functions, and observations and actions are streamed through clean APIs.

What’s especially nice is that all this architectural complexity stays under the hood for the user. From your perspective, Ark feels like working with a familiar machine learning library. You define your robot, sensors, and any extra components in a single YAML file, set a sim: true or false flag, and you’re ready to go. The main interface follows the classic OpenAI Gym pattern — reset() to start fresh, step() to run an action — so if you’ve written an RL or imitation learning loop before, you’ll feel right at home. Everything runs in Python, so you can collect data, train a model, and deploy it to real hardware without touching a line of C++ (unless you want the performance boost). The result is a robotics workflow that’s as accessible and iterative as the ones ML practitioners already use for computer vision or language models — but now, it drives an actual robot.

To make the sim‑to‑real story tangible, let’s walk through a short script from the Ark tutorials that controls a Franka Emika Panda arm. The beauty is that you don’t need to worry about message passing or hardware drivers at all; Ark’s FrankaEnv class hides all that complexity behind a simple Gym‑like API 🥳🥳🥳

from scripts.franka_env import FrankaEnv
import numpy as np

SIM = True   # True for simulation, False for the real robot
CONFIG = 'config/global_config.yaml'

env = FrankaEnv(sim=SIM, config=CONFIG)
observation, info = env.reset()

def policy(obs):
    return np.random.uniform(-0.3, 0.3, size=9)

for _ in range(1000):
    action = policy(observation)
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

What’s Happening Here?

SIM flag toggles simulation vs. reality
Setting SIM=True uses a PyBullet-based simulator, while SIM=False connects to the physical Franka arm. You don’t change anything else in your code — Ark swaps in the appropriate publishers and subscribers. (More details: LeArkTutorials)
CONFIG YAML file holds all the messy details
This file defines the robot’s URDF, controller gains, and which LCM channels to use. It’s the same file for simulation and real hardware, so your code remains agnostic. (More details: LeArkTutorials)
FrankaEnv gives you Gym methods
Calling env.reset() spins up the publisher-subscriber network, loads parameters, and returns the initial observation. env.step(action) sends your action to the /Franka/joint_group_command topic and returns the new observation, optional reward, and termination flags. (More details: LeArkTutorials)
Policy plug-in is trivial
In the example, we sample random joint velocities, but you can drop in an imitation-learning or reinforcement-learning policy trained in simulation. Because the observation and action spaces don’t change, you can train in simulation and then flip the SIM flag to deploy.

This hands-on code snippet reinforces the earlier point about Ark feeling like an extension of OpenAI Gym: you write Python code that looks like standard RL, but the same script will drive a real robot simply by changing one variable. It also illustrates how the underlying pub-sub architecture — topics like /Franka/joint_group_command and /Franka/joint_states — stays invisible to the user while still enabling clean separation between simulation and hardware.

Collecting Your Own Data

Ark’s data‑collection pipeline is designed to be as straightforward as possible. A typical workflow looks like this:

Launch the simulator and data‑collection environment. From the data_collection folder, start PyBullet and the logging environment as described in the Simulator. Remember to run the Ark registry before launching any of the nodes; otherwise your robot and sensors will fail to start.
Control the robot. You can either run an expert script (e.g. a pick‑and‑place trajectory) or teleoperate the robot with a PS4 controller. In the PS4 example, the D‑Pad controls x/y end‑effector velocities, L1 and R1 move it up and down, the joysticks adjust roll/pitch/yaw, and R2 opens or closes the Playstation Controller.
Check your logs. After each run, the python will will automatically saves a .pkl file containing the observations and actions to data_collection/trajectories/ Data_Collection. These files can be loaded directly for imitation‑learning or diffusion‑policy training.

Example: Teleoperation with a PS4 Controller

Here’s a simplified version of the ps4control_gym_example.py that shows how you might collect a trajectory using the controller. It assumes you’ve already set up your sensors, robot and controller in the YAML files (as shown in the PS4 Controller tutorial) and that ps4controller.get_action() returns a velocity command based on the current stick and button states Ps4_Control.

# ps4_teleop_collect.py
from scripts.franka_env import FrankaEnv
from ps4controller import PS4Controller  # your custom PS4 controller class
import pickle

SIM = True
CONFIG = 'config/global_config.yaml'
LOG_FILE = 'data_collection/trajectories/teleop_example.pkl'

env = FrankaEnv(sim=SIM, config=CONFIG)
controller = PS4Controller()   # reads sticks/buttons and produces velocity commands
obs, info = env.reset()

trajectory = []  # list of (obs, action) pairs

done = False
while not done:
    action = controller.get_action()  # map PS4 inputs to x/y/z + roll/pitch/yaw + gripper:contentReference[oaicite:5]{index=5}
    trajectory.append((obs, action))
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated

# Save demonstration as a pickle file
with open(LOG_FILE, 'wb') as f:
    pickle.dump(trajectory, f)

This script mirrors the default controller mapping described in the tutorial: D‑Pad for translational velocity, shoulder buttons for vertical motion, joysticks for rotation and R2 for the gripper

. Each env.step(action) call publishes your action command to the /Franka/joint_group_command topic and returns back all your observations such as joint states and camera data from channels like /Franka/joint_states/sim and /camera/rgbd/sim

. When the episode ends, the code writes the collected (observation, action) pairs to a .pkl file for later training.

Training a Diffusion Policy

Once you’ve collected a set of demonstration trajectories, you can train a policy using Ark’s provided scripts. The process is deliberately simple:

Load Your Data — Using your preferred machine learning framework (PyTorch, TensorFlow, or JAX), load the dataset from the pickle files into a DataLoader. This ensures your model can stream batches efficiently during training.
Run Sanity Checks — Before diving into model training, visualize and inspect your data. This step helps confirm everything is loaded correctly and can also guide you in selecting the most suitable hyperparameters..
Run the training script - In the diffusion‑policy example, you simply run training.py on the dataset to learn a policy. The script handles loading your .pkl trajectory files and optimizing the diffusion model.
Keep notes on settings. - Record hyper‑parameters during training because you’ll need them for reproducibility when comparing models.

Look at Diffusion_policy_implimentaion for more details.

Evaluating the Trained Policy

After training, Ark encourages you to inspect and validate your model before deployment:

Open the policy_inspector.ipynb notebook. This notebook loads your checkpoint and produces diagnostic plots. One key visualization is a 3D scatter plot of predicted versus expert actions 3D Scatter_plot. Ideally, the purple prediction dots align closely with the blue demonstration dots.
Saftey Checks. - If there are large drifts between the purple and blue points could mean your model is under‑fit (increase training steps or network capacity), that the actions are mis‑scaled (re‑normalize the data), or that the policy is missing important context (extend the horizon). Only once rollouts look smooth in simulation should you deploy, to prevent your robot from being damaged.

Now that you have got a trained policy you are ready to deploy it on a real robot! We can now utalise the same Gym-interface we used for collecting data to deploy our trained model.

Example Workflow

Now that you have your trained diffusion policy, we can test this in simulation Diffusion_rollout.

Thanks to the gym-like interface, swapping out the control logic is simple. Instead of using the expert policy from earlier (or a PS4 controller), you can plug in your trained diffusion policy. Every time you call step(), the process is straightforward:

Take the current observation from the environment.
Pass it through your model in a feed-forward pass.
Use the model’s predicted action as the input to step() .

This seamless integration means you can evaluate your learned policy under the exact same conditions as your baseline, making it easy to compare performance.

One of the advantages of this setup is its algorithm-agnostic design — you can train any machine learning model without needing to dive into the inner workings of the framework. Whether it’s a diffusion policy, ACT, or a simple supervised model, the interface stays the same. This abstraction lets you focus on designing and tuning your algorithm while the system takes care of the low-level details behind the scenes.

Welcome to the Robotics Revolution

Ark brings robotics into the same rapid-iteration workflow that Python developers have enjoyed in machine learning for years.
No more wrangling with obscure drivers, rewriting code for hardware, or starting from scratch when moving from simulation to reality. With Ark, you write Python, you train policies, and you deploy to real robots — all in one seamless ecosystem.

You can find the code here: LeArkCode, Ark's Paper here: LerArkPaper, and Ark's documentation here: LerArkDocumentation.

Have fun doing robotics in Python!

This blog would have never been possible if it weren't for Sarthak Das and Christopher Mower

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote