In imitation learning for robotics, cotraining with demonstration data generated both
in simulation and on real hardware has emerged as a promising recipe for scaling imitation
learning in robotics. This work seeks to elucidate basic principles of this
sim-and-real cotraining to inform simulation design, sim-and-real dataset
creation, and policy training.
Our experiments confirm that cotraining with simulated data can dramatically improve
performance, especially when real data is limited. We show that these performance
gains scale with additional simulated data up to a plateau; adding more real-world
data increases this performance ceiling. The results also suggest that reducing the
physical domain gaps may be more important than visual fidelity for non-prehensile
manipulation or contact-rich tasks. Perhaps surprisingly, we find that some visual
gap can help cotraining – binary probes reveal that high-performing policies must
learn to distinguish simulated domains from real. We conclude by investigating this
nuance and mechanisms that facilitate positive transfer between sim-and-real.
Focusing narrowly on the canonical task of planar pushing from pixels allows us to be
thorough in our study. In total, our experiments span over 50 real-world policies
(evaluated on 1000+ trials) and 250+ simulated policies (evaluated on 50,000+ trials).