reproducibilityindex.ai

Online Nonstochastic Model-Free Reinforcement Learning

Authors: Udaya Ghai, Arushi Gupta, Wenhan Xia, Karan Singh, Elad Hazan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method over various standard RL benchmarks and demonstrate improved robustness. ... We empirically evaluate our method on Open AI Gym environments in Section 5.
Researcher Affiliation	Collaboration	Udaya Ghai Amazon ughai@amazon.com Arushi Gupta Princeton University & Google Deep Mind arushig@princeton.edu Wenhan Xia Princeton University & Google Deep Mind wxia@princeton.edu Karan Singh Carnegie Mellon University karansingh@cmu.edu Elad Hazan Princeton University & Google Deep Mind ehazan@princeton.edu
Pseudocode	Yes	Algorithm 1 MF-GPC (Model-Free Gradient Perturbation Controller) ... Algorithm 2 DMF-GPC (Discrete Model-Free Gradient Perturbation Controller)
Open Source Code	No	The paper mentions using and basing its implementation on existing frameworks like Acme and D4PG, but it does not provide an explicit statement about releasing its own source code or a link to a repository for the methodology described.
Open Datasets	Yes	We apply the MF-GPC Algorithm 1 to various Open AI Gym [Brockman et al., 2016] environments.
Dataset Splits	No	The paper describes training duration ('1e7 steps', '1.5e7 steps') and averaging results over seeds ('25 seeds'), typical for RL. However, it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, and test sets) as it operates in dynamic reinforcement learning environments rather than on static pre-partitioned datasets.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or cloud computing specifications.
Software Dependencies	No	The paper mentions software components like 'Acme', 'DDPG', and 'D4PG', but it does not specify version numbers for these or other relevant software dependencies like programming languages or libraries.
Experiment Setup	Yes	We pick h = 5 and use the DDPG algorithm [Lillicrap et al., 2016] as our underlying baseline. We update the M matrices every 3 episodes instead of continuously to reduce runtime. We also apply weight decay to line 6 of Algorithm 1. Our implementation is based on the Acme implementation of D4PG. The policy and critic networks both have the default sizes of 256 256 256. We use the Acme default number of atoms as 51 for the network. We run in the distributed setting with 4 agents. The underlying learning rate of the D4PG implementation is left at 3e 04. The exploration parameter, σ is tuned.