Automatic Curriculum Learning through Value Disagreement

Authors: Yunzhi Zhang, Pieter Abbeel, Lerrel Pinto

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
Researcher Affiliation Academia Yunzhi Zhang UC Berkeley Pieter Abbeel UC Berkeley Lerrel Pinto UC Berkeley, NYU
Pseudocode Yes Algorithm 1 Curriculum Learning with Value Disagreement Sampling
Open Source Code Yes Code is publicly available at https://github.com/zzyunzhi/vds.
Open Datasets Yes We test our methods on 13 manipulation goal-conditioned tasks, 3 maze navigation tasks and 2 Ant-embodied navigation tasks, all with sparse reward, as shown in Figure 2. Detailed setup of the environments is presented in Appendix C.
Dataset Splits No The paper discusses 'training goals' and 'evaluation goals' but does not specify a distinct 'validation' dataset split for hyperparameter tuning or early stopping.
Hardware Specification Yes In fact, all of our experiments using VDS are run on a single CPU.
Software Dependencies No The paper mentions algorithms like DDPG and HER, but does not provide specific version numbers for software libraries, programming languages, or other dependencies.
Experiment Setup Yes Detailed hyper-parameter settings are specified in the Appendix D