Cotraining with demonstration data generated both in simulation and on real
hardware has emerged as a promising recipe for scaling imitation learning in robotics.
This work seeks to elucidate basic principles of this sim-and-real cotraining
to inform simulation design, sim-and-real dataset creation, and policy training.
Our experiments confirm that cotraining with simulated data can dramatically improve
performance, especially when real data is limited. We show that these performance gains
scale with additional simulated data up to a plateau; adding more real-world data
increases this performance ceiling. The results also suggest that reducing physical
domain gaps may be more impactful than visual fidelity for non-prehensile or
contact-rich tasks. Perhaps surprisingly, we find that some visual gap can help
cotraining -- binary probes reveal that high-performing policies must learn to
distinguish simulated domains from real. We conclude by investigating this nuance and
mechanisms that facilitate positive transfer between sim-and-real.
Focusing narrowly on the canonical task of planar pushing from pixels allows us to be
thorough in our study. In total, our experiments span 50+ real-world policies (evaluated
on 1000+ trials) and 250 simulated policies (evaluated on 50,000+ trials).