Distributional Policy Optimization: An Alternative Approach for Continuous Control
Authors: Chen Tessler, Guy Tennenholtz, Shie Mannor
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation shows that our approach is comparable and often surpasses current state-of-the-art baselines in continuous domains. |
| Researcher Affiliation | Academia | Technion Institute of Technology, Haifa, Israel |
| Pseudocode | Yes | Algorithm 1 Distributional Policy Optimization (DPO) |
| Open Source Code | Yes | Code provided in the following anonymous repository: github.com/tesslerc/GAC |
| Open Datasets | Yes | In order to evaluate our approach, we test GAC on a variety of continuous control tasks in the Mu Jo Co control suite [Todorov et al., 2012]. |
| Dataset Splits | No | The paper describes training and evaluation steps ('We run each task for 1 million steps... evaluate the policy every 5000 steps and report the average over 10 evaluations') but does not specify explicit training/validation/test dataset splits with percentages or sample counts, which is typical for fixed datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. It mentions running on 'complex robotic machines' but not the specifications of the machines used for training or simulation. |
| Software Dependencies | No | The paper mentions using implementations of DDPG and PPO from the Open AI baselines repo and TD3 from the authors' GitHub repository, but it does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | We run each task for 1 million steps and, as GAC is an off-poicy approach, evaluate the policy every 5000 steps and report the average over 10 evaluations. We train GAC using a batch size of 128 and uncorrelated Gaussian noise for exploration. The results presented in Figure 4 are obtained using 32 (256 for Half Cheetah and Walker) samples at each step. |