OMNI: Open-endedness via Models of human Notions of Interestingness

Authors: Jenny Zhang, Joel Lehman, Kenneth Stanley, Jeff Clune

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate OMNI on three challenging domains, Crafter (Hafner, 2021) (a 2D version of Minecraft), Baby AI (Chevalier-Boisvert et al., 2018) (a 2D grid world for grounded language learning), and AI2-THOR (Kolve et al., 2017) (a 3D photo-realistic embodied robotics environment). OMNI outperforms baselines based on uniform task sampling or learning progress alone.
Researcher Affiliation Collaboration Jenny Zhang1,2 Joel Lehman3 Kenneth Stanley4 Jeff Clune1,2,5 1Department of Computer Science, University of British Columbia 2Vector Institute 3Stochastic Labs 4Maven 5Canada CIFAR AI Chair
Pseudocode Yes Algorithm 1 OMNI Algorithm
Open Source Code No The paper mentions 'Project website: https://www.jennyzhangzt.com/omni/', but does not explicitly state that source code for the methodology is provided there, nor is it a direct link to a code repository.
Open Datasets Yes We evaluate OMNI on three challenging domains, Crafter (Hafner, 2021) (a 2D version of Minecraft), Baby AI (Chevalier-Boisvert et al., 2018) (a 2D grid world for grounded language learning), and AI2-THOR (Kolve et al., 2017) (a 3D photo-realistic embodied robotics environment).
Dataset Splits No The paper does not provide specific percentages or counts for training, validation, or test dataset splits. It mentions 'validation' in the context of the agent's learning process rather than a static dataset split.
Hardware Specification Yes Each experiment takes about 33 hrs for Crafter and 60 hrs for Baby AI on a 24GB NVIDIA A10 GPU with 30 virtual CPUs.
Software Dependencies No The paper mentions software like PPO, GRU, LSTM, and refers to GPT-3 and GPT-4 APIs, but does not provide specific version numbers for these software components or any other libraries used.
Experiment Setup Yes Appendices L, M, and N provide detailed tables listing specific hyperparameters for the training process, including 'Discount factor', 'Learning rate', 'PPO clip threshold', 'GAE lambda', 'Entropy coefficient', 'Batch size', 'Epochs', and 'Max episode length' for Crafter, Baby AI, and AI2-THOR environments.