Kuang-Huei Lee*, Ofir Nachum*, Mengjiao Yang, Lisa Lee, Winnie Xu, Daniel Freeman, Sergio Guadarrama, Eric Jang, Henryk Michalewski, Ian Fischer, Igor Mordatch
A longstanding goal in the field of AI is to find a strategy for compiling diverse experience into a highly capable, generalist agent. In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets. Motivated by this progress, we train one generalist control agent on a diverse set of offline data to play 46 Atari games. We demonstrate scaling of performance in the model size and rapid adaptation to novel games with fine-tuning. Compared to existing methods in behavioural cloning, online, and offline RL, MGDT offers the best scalability and performance.
Neural Information Processing Systems (NeurIPS), 2022 [Oral].