Fractal Landscapes in Policy Optimization
Authors: Tao Wang, Sylvia Herbert, Sicun Gao
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we will validate the theory presented in this paper through common RL tasks. |
| Researcher Affiliation | Academia | Tao Wang UC San Diego taw003@ucsd.edu Sylvia Herbert UC San Diego sherbert@ucsd.edu Sicun Gao UC San Diego sicung@ucsd.edu |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | All environments are adopted from The Open AI Gym Documentation [5] with continuous control input. [5] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. Open AI Gym. arXiv preprint arXiv:1606.01540, 2016. |
| Dataset Splits | No | The paper uses environments from The Open AI Gym Documentation [5] but does not provide specific details on train/validation/test splits (e.g., percentages, sample counts, or explicit standard split references). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions using neural networks and various RL algorithms, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions). |
| Experiment Setup | Yes | The initial parameter is given by θ0 ~ N(0, 0.052 I). (for Inverted Pendulum) and The initial parameter is again sampled from θ0 ~ N(0, 0.052 I). (for Acrobot) and the initial parameter is instead sampled from θ0 ~ N(0, 102 I). (for Hopper). Also, the stochastic policy is given by πθ(~|s) ~ N(u(s), σ2Ip) where the mean u(s) is represented by the 2-layer neural network u(s) = W2 tanh(W1s) where W1 ∈ Rr×n and W2 ∈ Rm×r are weight matrices. And For the width of the hidden layer, we use r = 8 for the inverted pendulum and acrobot, and r = 64 for the hopper. |