Policy Transfer with Strategy Optimization

Authors: Wenhao Yu, C. Karen Liu, Greg Turk

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on five simulated robotic control problems with different discrepancies in the training and testing environment and demonstrate that our method can overcome larger modeling errors compared to training a robust policy or an adaptive policy.
Researcher Affiliation Academia Wenhao Yu, C. Karen Liu, Greg Turk School of Interactive Computing Georgia Institute of Technology Atlanta, GA wenhaoyu@gatech.edu, {karenliu,turk}@cc.gatevch.edu
Pseudocode No The paper describes its algorithm in prose, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper references several third-party open-source projects used in their work (e.g., PyBullet, OpenAI Baselines, PyDART2, DartEnv, Pycma), but it does not provide concrete access or an explicit statement about the availability of the source code for their own proposed methodology (SO-CMA).
Open Datasets Yes We build a single-legged robot in DART similar to the Hopper environment simulated by Mu Jo Co in Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper describes training (e.g., "For training policies in the source environment, we run PPO for 500 iterations. In each iteration, we sample 40, 000 steps from the source environment to update the policy.") and testing procedures, but it does not explicitly specify a validation dataset split for model evaluation or hyperparameter tuning.
Hardware Specification No The paper states that it uses PPO implemented in Open AI Baselines for training policies but does not provide any specific hardware details such as CPU/GPU models, memory, or cloud computing resources used for the experiments.
Software Dependencies No The paper mentions using several software components like "Open AI Baselines", "Dart Env", "Py Dart", "Open AI Gym", and "Pycma", but it does not provide specific version numbers for these dependencies, making full replication challenging.
Experiment Setup Yes For all of our examples, we represent the policy as a feed-forward neural network with three hidden layers, each consists of 64 hidden nodes.