Fractal Landscapes in Policy Optimization

Authors: Tao Wang, Sylvia Herbert, Sicun Gao

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we will validate the theory presented in this paper through common RL tasks.
Researcher Affiliation Academia Tao Wang UC San Diego taw003@ucsd.edu Sylvia Herbert UC San Diego sherbert@ucsd.edu Sicun Gao UC San Diego sicung@ucsd.edu
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets Yes All environments are adopted from The Open AI Gym Documentation [5] with continuous control input. [5] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Open AI Gym. arXiv preprint arXiv:1606.01540, 2016.
Dataset Splits No The paper uses environments from The Open AI Gym Documentation [5] but does not provide specific details on train/validation/test splits (e.g., percentages, sample counts, or explicit standard split references).
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory, or cloud instance types).
Software Dependencies No The paper mentions using neural networks and various RL algorithms, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup Yes The initial parameter is given by θ0 ~ N(0, 0.052 I). (for Inverted Pendulum) and The initial parameter is again sampled from θ0 ~ N(0, 0.052 I). (for Acrobot) and the initial parameter is instead sampled from θ0 ~ N(0, 102 I). (for Hopper). Also, the stochastic policy is given by πθ(~|s) ~ N(u(s), σ2Ip) where the mean u(s) is represented by the 2-layer neural network u(s) = W2 tanh(W1s) where W1 ∈ Rr×n and W2 ∈ Rm×r are weight matrices. And For the width of the hidden layer, we use r = 8 for the inverted pendulum and acrobot, and r = 64 for the hopper.