reproducibilityindex.ai

Policy Transfer with Strategy Optimization

Authors: Wenhao Yu, C. Karen Liu, Greg Turk

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on ﬁve simulated robotic control problems with different discrepancies in the training and testing environment and demonstrate that our method can overcome larger modeling errors compared to training a robust policy or an adaptive policy.
Researcher Affiliation	Academia	Wenhao Yu, C. Karen Liu, Greg Turk School of Interactive Computing Georgia Institute of Technology Atlanta, GA wenhaoyu@gatech.edu, {karenliu,turk}@cc.gatevch.edu
Pseudocode	No	The paper describes its algorithm in prose, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper references several third-party open-source projects used in their work (e.g., PyBullet, OpenAI Baselines, PyDART2, DartEnv, Pycma), but it does not provide concrete access or an explicit statement about the availability of the source code for their own proposed methodology (SO-CMA).
Open Datasets	Yes	We build a single-legged robot in DART similar to the Hopper environment simulated by Mu Jo Co in Open AI Gym (Brockman et al., 2016).
Dataset Splits	No	The paper describes training (e.g., "For training policies in the source environment, we run PPO for 500 iterations. In each iteration, we sample 40, 000 steps from the source environment to update the policy.") and testing procedures, but it does not explicitly specify a validation dataset split for model evaluation or hyperparameter tuning.
Hardware Specification	No	The paper states that it uses PPO implemented in Open AI Baselines for training policies but does not provide any specific hardware details such as CPU/GPU models, memory, or cloud computing resources used for the experiments.
Software Dependencies	No	The paper mentions using several software components like "Open AI Baselines", "Dart Env", "Py Dart", "Open AI Gym", and "Pycma", but it does not provide specific version numbers for these dependencies, making full replication challenging.
Experiment Setup	Yes	For all of our examples, we represent the policy as a feed-forward neural network with three hidden layers, each consists of 64 hidden nodes.