reproducibilityindex.ai

Fractal Landscapes in Policy Optimization

Authors: Tao Wang, Sylvia Herbert, Sicun Gao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we will validate the theory presented in this paper through common RL tasks.
Researcher Affiliation	Academia	Tao Wang UC San Diego taw003@ucsd.edu Sylvia Herbert UC San Diego sherbert@ucsd.edu Sicun Gao UC San Diego sicung@ucsd.edu
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets	Yes	All environments are adopted from The Open AI Gym Documentation [5] with continuous control input. [5] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Open AI Gym. arXiv preprint arXiv:1606.01540, 2016.
Dataset Splits	No	The paper uses environments from The Open AI Gym Documentation [5] but does not provide specific details on train/validation/test splits (e.g., percentages, sample counts, or explicit standard split references).
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory, or cloud instance types).
Software Dependencies	No	The paper mentions using neural networks and various RL algorithms, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup	Yes	The initial parameter is given by θ0 ~ N(0, 0.052 I). (for Inverted Pendulum) and The initial parameter is again sampled from θ0 ~ N(0, 0.052 I). (for Acrobot) and the initial parameter is instead sampled from θ0 ~ N(0, 102 I). (for Hopper). Also, the stochastic policy is given by πθ(~\|s) ~ N(u(s), σ2Ip) where the mean u(s) is represented by the 2-layer neural network u(s) = W2 tanh(W1s) where W1 ∈ Rr×n and W2 ∈ Rm×r are weight matrices. And For the width of the hidden layer, we use r = 8 for the inverted pendulum and acrobot, and r = 64 for the hopper.