Talk to Your Robot Arm in Minutes - Part 1

Phillip Thomas

Sep 15, 2025

Overview

Most robotics demos require days of dependency hell and setup to replicate, but this speech-controlled robot arm system deploys in minutes through make87 templates. Using Model Context Protocol (MCP) to connect local AI models, the system responds to voice commands for movement and scene understanding without requiring internet connectivity or complex configuration. This demonstrates how physical AI development becomes accessible when deployment, networking, and integration complexities are abstracted away by proper infrastructure.

Introduction
Why Robotics Development Is So Slow
- The Hidden Cost of Robotics Setup
- Sharing, Scaling, and Collaboration: Three Problems, One Cause
A Different Approach
What’s Happening in the System
Benefits for Developers
- Built on Git, Ready to Deploy
- Real-Time Visualization
Try It Yourself

Introduction

I tell my robot arm “move forward” — it moves.

I ask “what’s on the table?” — it replies: “a red sphere on a dark surface, electronics, and a wooden floor.”

No machine-specific hacks, no CUDA version roulette, no 7-day install grind. Just a working system — and a glimpse of how physical AI development becomes easier to build, share, and collaborate on.

As a developer, that means saving days of wasted setup time by simply picking a template that runs right away.

Most robotics demos look amazing on video but collapse into a week of setup hell if you try to replicate them. We wanted something different: a speech-controlled MCP demo that shows what’s possible when you build on make87 — something you can actually run yourself, on your own hardware, in minutes.

Why Robotics Development Is So Slow

The Hidden Cost of Robotics Setup

Anyone who’s tried to stitch together a robotics system knows the routine:

Dependency hell: version conflicts, breaking changes, and libraries that don’t follow semantic versioning. Getting multiple components to run in the same environment is a constant fight.
Hardware headaches: CUDA and GPU mismatches that work on one machine and fail on another
Networking pain: Setting up IP addresses, managing firewalls, opening ports, and keeping configs in sync. It might work on your dev machine, but the moment you distribute across multiple nodes it breaks — and you burn hours just getting components to see each other.
Integration complexity: Once the machines can talk, you’re still wrangling massive config files — topic names, parameters, remappings — and a single typo can take hours to track down. Adding new components means more config to manage, more chances to miss something, and more frustrating debug cycles.
Slow release & iteration: Every cycle of testing, deploying, and debugging eats time. Without proper CI/CD, monitoring, and health checks, you’re forced to cobble these together yourself just to keep systems alive — work that adds no new features but is essential to even move forward. You test on your dev PC or in sim, deploy to the real robot, it fails in new ways, you collect logs, patch, redeploy. Each loop burns hours or days.

Info: In our next post, we’ll show how make87 enables live development directly on the robot, cutting this loop dramatically.

In robotics, sharing, scaling, and collaboration may sound like different challenges, but they all hinge on the same underlying issues: deployment, networking, and reproducibility.

Sharing is the lifeblood of open source and research. Robotics has thrived on GitHub projects and academic releases, but most of them are impossible to run without days of setup.
Scaling is the challenge companies face. Running the same system reliably across dozens of machines or different hardware platforms is a huge operational burden.
Collaboration cuts across both worlds. Companies need their teams to work on the same design or prototype without stepping on each other’s toes, while open source communities want to build on top of each other’s work instead of starting from scratch.

The truth is: you can’t solve one without the others. If your system isn’t reproducible, you can’t share it. If it isn’t portable, you can’t scale it. If it isn’t modular, you can’t collaborate on it.

make87 solves this at the source. Templates let you package up full systems — not just code, but deployment configs, networking, and component relationships — so they “just run.” That means open source projects become runnable for others, companies can scale their systems faster, and teams can collaborate on shared designs without blocking each other. By solving the root problem once, we aim to accelerate progress for both the open source community and the companies bringing physical AI into production.

A Different Approach

This isn’t a toy or a staged prototype. It’s a complete physical AI system that runs out of the box. In this system, all AI models run locally on your machine — no internet connection required.

And replicating it is simple:

Install the make87 client on your node (docs)
Select the SO-ARM100 MCP Voice Control template in the web UI
Assign the apps to your node
Click Deploy — after a short download, you’re up and running

That’s all it takes to run this demo. But the process isn’t limited to robot arms — the same flow applies when you build and share your own systems, whether with teammates, fellow researchers, or the wider community.

What’s Happening in the System

When you say “move forward”, here’s the full loop behind the scenes:

Speech-to-text (using Whisper) converts your voice command into text
Controller Agent passes the request to an llm (running Qwen3-Instruct via Ollama) that decides which MCP tools to call
Scene understanding (using Gemma3) analyzes the live camera feed and provides a queryable description of the environment
Robot control (via LeRobot end-effector interface) moves the arm according to the command
Text-to-speech (using pyttsx3) generates the spoken response back
Visualization (with Rerun) shows the decision-making and robot motion in real time

All connected with MCP (Model Context Protocol) — so the same pattern works not just here, but with any MCP-enabled robot.

Benefits for Developers

Built on Git, Ready to Deploy

On make87, you’re not forced into a closed environment. Everything builds on top of your Git repositories — you manage your code however you like, and we take care of:

Building and versioning your applications automatically
Creating deployable images across hardware targets
Connecting your code into larger systems without rewrites

In a follow-up post, we’ll show how this ties into remote on-robot development: live coding on the robot itself from your browser, with faster dev cycles and no broken toolchains.

The bigger picture is accelerating robotics development: code that’s not just written, but runnable, reusable, and connectable to other applications.

Real-Time Visualization

Debugging robotics normally means chasing logs across terminals and files. With make87, you get unified access to all application logs — and can even deploy advanced visualizers like Rerun to see exactly what’s happening:

Robot model with live joint states
Camera views with pinhole projections
LLM reasoning and MCP calls as they happen

The data logs that were captured during the demo recordings.

If something breaks, you can replay the sequence later and see exactly what the robot “thought” at that moment. And since it’s modular, you can swap in Foxglove or another visualizer if you prefer.

Click to Load Rerun Visualizer

This viewer will use data

The data logs that were captured during the demo recordings. Click here to open it in a new tab.

Where You Can Go From Here

Right now, this template exposes basic end-effector commands like:

“Move forward”
“Turn right”
“What can you see?”
“Pitch down”

That proves the full loop — speech in, action out. But it’s just the base layer. You can extend the same system with:

Policy models such as ACT or π₀ for high-level skills (“pick up the apple”)
Navigation stacks like Nav2 or TinyNav for autonomous movement
Composite behaviors chaining multiple capabilities (“go to the kitchen and bring back the red cup”)

With make87’s modular, middleware-agnostic setup, you can start simple, then layer in advanced policies, navigation stacks, and multi-skill behaviors without rewriting the whole system. The platform brokers the configuration — addresses, ports, and other values — so your components connect cleanly regardless of whether they speak ROS, Zenoh, MCP, gRPC, or something else.

Try It Yourself

The Voice-Controlled Robot Arm template is a starting point — but it’s only one of many. Check out our template library or build your own from scratch. Every system you create can be versioned, shared, reused, and connected to others.

When you click Deploy you’ll get step-by-step guidance in the web UI. No docs to dig through, no support tickets. Just a system that comes online in minutes. And if you want to go deeper, you can reach out — we love hearing from people building new physical AI systems.

Requirements for this demo:

SO-ARM100 or -101 robot arm
Any webcam and mic (input/output handled via web interface)
Any modern x86_64 or aarch64 computer

We'd love to get your feedback and invite you to join our make87 Discord server for any questions or discussions!