Composing Entropic Policies using Divergence Correction

Authors: Jonathan Hunt, Andre Barreto, Timothy Lillicrap, Nicolas Heess

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study this approach in the tabular case and on non-trivial continuous control problems with compositional structure and show that it outperforms or matches existing methods across all tasks considered.
Researcher Affiliation Industry Jonathan J Hunt 1 Andre Barreto 1 Timothy P Lillicrap 1 Nicolas Heess 1 1Deep Mind. Correspondence to: Jonathan J Hunt <jjhunt@google.com>.
Pseudocode Yes Algorithm 1 AISBP training algorithm
Open Source Code No Videos of the tasks and supplementary information at https: //tinyurl.com/yaplfwaq.
Open Datasets No The paper describes using simulated environments (e.g., 8x8 tabular world, point mass, planar manipulator, jumping ball, ant) and generating experience within them. It does not refer to a specific public dataset with access information (link, DOI, citation).
Dataset Splits No The paper describes training using a replay buffer and evaluating performance, but it does not specify explicit training, validation, or test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper refers to tools like DeepMind Control Suite and MuJoCo in the references, but it does not provide specific software dependencies with version numbers for replication (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1' or 'TensorFlow 2.x').
Experiment Setup No The paper describes its algorithms and theoretical components, and mentions that full details are in appendix C, but it does not provide specific hyperparameter values or concrete training configurations in the main text.