Composing Entropic Policies using Divergence Correction
Authors: Jonathan Hunt, Andre Barreto, Timothy Lillicrap, Nicolas Heess
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study this approach in the tabular case and on non-trivial continuous control problems with compositional structure and show that it outperforms or matches existing methods across all tasks considered. |
| Researcher Affiliation | Industry | Jonathan J Hunt 1 Andre Barreto 1 Timothy P Lillicrap 1 Nicolas Heess 1 1Deep Mind. Correspondence to: Jonathan J Hunt <jjhunt@google.com>. |
| Pseudocode | Yes | Algorithm 1 AISBP training algorithm |
| Open Source Code | No | Videos of the tasks and supplementary information at https: //tinyurl.com/yaplfwaq. |
| Open Datasets | No | The paper describes using simulated environments (e.g., 8x8 tabular world, point mass, planar manipulator, jumping ball, ant) and generating experience within them. It does not refer to a specific public dataset with access information (link, DOI, citation). |
| Dataset Splits | No | The paper describes training using a replay buffer and evaluating performance, but it does not specify explicit training, validation, or test dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper refers to tools like DeepMind Control Suite and MuJoCo in the references, but it does not provide specific software dependencies with version numbers for replication (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1' or 'TensorFlow 2.x'). |
| Experiment Setup | No | The paper describes its algorithms and theoretical components, and mentions that full details are in appendix C, but it does not provide specific hyperparameter values or concrete training configurations in the main text. |