reproducibilityindex.ai

Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

Authors: Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments are designed to validate and get conﬁdence in the DOMi NO agent. We emphasize that we do not explicitly compare DOMi NO with previous work nor argue that one works better than the other. Instead, we address the following questions: (a) Can DOMi NO discover diverse policies that are near optimal? see Fig. 2, Appendix C.1, Fig. 1b and the videos in the supplementary. (b) Can DOMi NO balance the QD trade-off? see Fig. 2, Fig. 2 & 3. (c) Do the discovered policies enable robustness and fast adaptation to perturbations of the environment? (see Fig. 4).
Researcher Affiliation	Industry	Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou and Satinder Singh Deep Mind, London
Pseudocode	Yes	Pseudo code and further implementation details, as well as treatment of the discounted state occupancy, can be found in Appendix B.
Open Source Code	No	The paper references existing open-source libraries used (e.g., rlax), but does not state that the authors' own implementation of DOMi NO is open-source or provide a link to its source code.
Open Datasets	Yes	We conducted most of our experiments on domains from the DM Control Suite (Tassa et al., 2018), standard continuous control locomotion tasks where diverse near-optimal policies should naturally correspond to different gaits.
Dataset Splits	No	The paper reports 95% confidence intervals and uses multiple seeds for experiments, but it does not specify train/validation/test dataset splits (e.g., percentages or counts) or a cross-validation setup for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'RLAX' and optimizers like 'RMSprop' and 'Adam', but it does not specify version numbers for these software components.
Experiment Setup	Yes	The hyperparameters in Table 2 are shared across all environments except in the Bi Pedal Domain the learning rate is set to 10 5 and the learner frames are 5 107. We report the DOMi NO speciﬁc hyperparameters in Table 3.