reproducibilityindex.ai

Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning

Authors: Maxime Wabartha, Joelle Pineau

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate HC policies in control and navigation experiments, visualize the improved interpretability of the agent and highlight its trade-off with performance. Moreover, we validate that the restricted model class that the Hyper Combinator belongs to is compatible with the algorithmic constraints of various reinforcement learning algorithms.
Researcher Affiliation	Collaboration	Maxime Wabartha Mc Gill University, Mila Joelle Pineau Mc Gill University, Mila, FAIR at Meta
Pseudocode	Yes	Algorithm 1 SAC (with Hyper Combinator actor) ... Algorithm 2 Update Actor And Alpha
Open Source Code	No	The paper states 'We base ourselves on an open-source Py Torch implementation of SAC (Yarats & Kostrikov, 2020)' and 'We base our experiments on the open-source code provided by RIS (Chane-Sane et al., 2021)'. These refer to third-party baseline implementations, not an explicit release of the Hyper Combinator code developed in this paper.
Open Datasets	Yes	We evaluate how well HC policies can control proprioceptive variables such as the joints of a robot through the Deep Mind Control Suite benchmark (Tassa et al., 2018).
Dataset Splits	No	The paper describes evaluation procedures like 'We evaluate the agent every 10000 timesteps by rolling it out for 10 episodes and taking the average return' and 'Every 10000 steps, we roll out 5 evaluation episodes'. However, it does not explicitly provide traditional training/test/validation dataset splits, which is common for reinforcement learning research that involves environmental interaction rather than static datasets.
Hardware Specification	Yes	All the GPUs were NVIDIA Tesla V100, with 16GB memory available. The CPUs were Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz Each seed was allocated 1 GPU, 10 CPUs, and 64GB of RAM.
Software Dependencies	No	The paper mentions software like 'Python (Van Rossum & Drake Jr, 1995)', 'numpy (Van Der Walt et al., 2011)', 'matplotlib (Hunter, 2007)', and 'Py Torch (Paszke et al., 2017)' in the acknowledgements, implying their respective release years. However, it does not provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.9' or 'Python 3.8').
Experiment Setup	Yes	Table 3: Full list of hyperparameters in the control experiments. Includes: Action repeat 1, Discount factor 0.99, Learnable α True, Initial α 0.1, α learning rate λα 1e-4, Actor learning rate λπ 1e-4, Actor update frequency 1, Critic architecture [1024, 1024], Critic learning rate λQ 1e-4, Batch size 1024, log σmin -5, log σmax 2, Gumbel net architecture [1024, 1024, 1024], Sub-policy assignation entropy coefficient λassig 0.001, Gumbel temperature 1.