Policy Transfer with Strategy Optimization
Authors: Wenhao Yu, C. Karen Liu, Greg Turk
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on five simulated robotic control problems with different discrepancies in the training and testing environment and demonstrate that our method can overcome larger modeling errors compared to training a robust policy or an adaptive policy. |
| Researcher Affiliation | Academia | Wenhao Yu, C. Karen Liu, Greg Turk School of Interactive Computing Georgia Institute of Technology Atlanta, GA wenhaoyu@gatech.edu, {karenliu,turk}@cc.gatevch.edu |
| Pseudocode | No | The paper describes its algorithm in prose, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper references several third-party open-source projects used in their work (e.g., PyBullet, OpenAI Baselines, PyDART2, DartEnv, Pycma), but it does not provide concrete access or an explicit statement about the availability of the source code for their own proposed methodology (SO-CMA). |
| Open Datasets | Yes | We build a single-legged robot in DART similar to the Hopper environment simulated by Mu Jo Co in Open AI Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper describes training (e.g., "For training policies in the source environment, we run PPO for 500 iterations. In each iteration, we sample 40, 000 steps from the source environment to update the policy.") and testing procedures, but it does not explicitly specify a validation dataset split for model evaluation or hyperparameter tuning. |
| Hardware Specification | No | The paper states that it uses PPO implemented in Open AI Baselines for training policies but does not provide any specific hardware details such as CPU/GPU models, memory, or cloud computing resources used for the experiments. |
| Software Dependencies | No | The paper mentions using several software components like "Open AI Baselines", "Dart Env", "Py Dart", "Open AI Gym", and "Pycma", but it does not provide specific version numbers for these dependencies, making full replication challenging. |
| Experiment Setup | Yes | For all of our examples, we represent the policy as a feed-forward neural network with three hidden layers, each consists of 64 hidden nodes. |