reproducibilityindex.ai

Environment Probing Interaction Policies

Authors: Wenxuan Zhou, Lerrel Pinto, Abhinav Gupta

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally show that EPI-conditioned task-speciﬁc policies signiﬁcantly outperform commonly used policy generalization methods on novel testing environments.
Researcher Affiliation	Collaboration	Wenxuan Zhou1, Lerrel Pinto1, Abhinav Gupta1,2 1The Robotics Institute, Carnegie Mellon University 2Facebook AI Research
Pseudocode	No	No pseudocode or clearly labeled algorithm block was found.
Open Source Code	Yes	Code is available at https://github.com/Wenxuan-Zhou/EPI.
Open Datasets	Yes	For this, we use the Striker and the Hopper Mu Jo Co (Todorov et al., 2012) environments from Open AI Gym (Brockman et al., 2016).
Dataset Splits	Yes	To train our prediction models, a dataset of transition data (st, at, st+1) is collected in the training environments using a pre-trained task policy (Sec. 4.1.3). This data is split into a training set and a validation set.
Hardware Specification	No	No specific hardware details (like GPU/CPU models or cloud instance types) used for experiments were mentioned.
Software Dependencies	No	The paper mentions optimization by Adam and TRPO with rllab implementation, but does not provide specific version numbers for these software components.
Experiment Setup	Yes	An EPI-trajectory contains 10 steps of observations and actions for both Hopper and Striker. The embedding network ψ ... has two fully connected layers with 32 neurons each... The prediction models ... has four fully connected layers with 128 neurons each... The EPI-policy is trained for 200 400 iterations in total with a batch size of 10000 timesteps. The task policy will then use the trained EPI-policy and the embedding network to update for 1000 iterations with a batch size of 100000 timesteps.