I spent a year building a system where four robots autonomously arrange themselves into geometric shapes. The hardware was cheap: four ESP32 microcontrollers, some motors, a webcam mounted on a tripod. The hard part had nothing to do with hardware.
It was getting four agents to agree on what reality looks like.
The Setup
Each robot is an ESP32 driving two DC motors. An overhead camera connected to a laptop runs OpenCV to track the robots’ positions using colored markers. A central broker (Mosquitto, running on the laptop) handles all MQTT communication. The goal: you define a shape (say, a square) and the four robots drive to their assigned corners autonomously.
Simple on paper. Brutal in practice.
Why MQTT
I picked MQTT because it was designed for constrained devices on unreliable networks, which is exactly the situation. The publish/subscribe model meant robots didn’t need to know about each other directly. Each robot subscribes to its own command topic and publishes its status. The broker handles routing.
What I didn’t expect: the broker becoming a single point of failure. Early on, when the laptop went to sleep mid-run, all four robots stopped receiving commands and kept driving until they hit walls. Lesson one: you need a heartbeat. Robots now stop if they haven’t received a message in 500ms.
The Latency Problem
The first version worked, but robots would occasionally collide. The root cause: by the time a new position frame was processed by OpenCV and a corrected command was published, the robot had already moved 3-4cm further than expected. At 200ms total loop latency, that’s enough to cause a collision at the speeds I was running.
Getting this under 50ms required cutting the pipeline at every stage:
- Camera capture: dropped from 30fps to 60fps, used
cv2.CAP_DSHOWon Windows to bypass the default DirectShow buffer lag - Position detection: replaced contour-based detection with ArUco markers. 5ms per frame instead of 25ms.
- MQTT publish rate: switched from a polling loop to event-driven publishing, only sending when position changed by more than 5mm
- ESP32 side: moved from Arduino
delay()loops to FreeRTOS tasks with proper priority scheduling
End result: 43ms average loop latency, 50ms worst case. Robots stopped colliding.
The Real Distributed Systems Problem
Here’s what textbooks don’t tell you about distributed systems: the hard part isn’t speed, it’s consensus.
When four robots are all moving simultaneously toward target positions, and each one is making decisions based on a position snapshot that’s 43ms old, you get drift. Robot A thinks Robot B is at position (120, 340). Robot B is actually at (128, 351). Both robots plan paths that don’t account for where the other actually is.
My solution was centralised path planning. The laptop calculates all four paths before any robot moves, checks for predicted intersections, and staggers departure times by 200ms per robot. It’s not elegant. A proper solution would involve distributed consensus and real-time replanning, but it worked reliably for the demos I needed.
The staggered departure approach also had an unexpected benefit: it made the shape formation look intentional and choreographed rather than chaotic. Accidentally good UX.
What Actually Broke
The most painful bug: MQTT message ordering. MQTT guarantees delivery (at QoS 1 and 2) but not ordering. Early on, a “go to position X” message would occasionally arrive after a “stop” message, causing a robot to start moving again after it had correctly stopped.
Fix: sequence numbers on every command. If a robot receives command #47 after already executing command #51, it ignores it. Ten lines of code, two weeks of confusion before I found the root cause.
What I’d Do Differently
If I rebuilt this now, I’d run ROS 2 instead of raw MQTT. The message ordering, timing, and multi-agent coordination primitives are already solved there. MQTT made sense for learning, but for anything production-grade with more than 2-3 agents, the overhead of building these guarantees yourself adds up fast.
I’d also instrument everything from day one. The latency breakdown I described above took three days to measure properly. Profiling should have been the first thing I built, not the last.
The system eventually ran 47 consecutive formation changes without a collision or communication failure. Getting there took about 400 hours across a year. Worth it.