reproducibilityindex.ai

Learning Calibratable Policies using Programmatic Style-Consistency

Authors: Eric Zhan, Albert Tseng, Yisong Yue, Adith Swaminathan, Matthew Hausknecht

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our framework using demonstrations from professional basketball players and agents in the Mu Jo Co physics environment, and show that existing approaches that do not explicitly enforce style-consistency fail to generate diverse behaviors whereas our learned policies can be calibrated for up to 45(1024) distinct style combinations.
Researcher Affiliation	Collaboration	1California Institute of Technology, Pasadena, CA 2Microsoft Research, Redmond, WA. Correspondence to: Eric Zhan <ezhan@caltech.edu>.
Pseudocode	Yes	Algorithm 1 Generic recipe for optimizing (5) and Algorithm 2 Model-based approach for Algorithm 1
Open Source Code	Yes	Code is available at: https://github.com/ezhan94/ calibratable-style-consistency.
Open Datasets	Yes	Data. We validate our framework on two datasets: 1) a collection of professional basketball player trajectories... and 2) a Cheetah agent running horizontally in Mu Jo Co (Todorov et al., 2012) with the goal of learning a policy with calibrated gaits. ... We obtain Cheetah demonstrations from a collection of policies trained using pytorch-a2c-ppo-acktr (Kostrikov, 2018) to interface with the Deep Mind Control Suite s Cheetah domain (Tassa et al., 2018) see Appendix C for details.
Dataset Splits	Yes	Hyperparameters are set using a random search (Bergstra & Bengio, 2012) over 20 runs, and the best ones were chosen based on the validation reconstruction loss. We also specify a training/validation split for the expert demonstrations to prevent overfitting.
Hardware Specification	Yes	All models were trained on a single NVIDIA GeForce GTX 1080 Ti GPU.
Software Dependencies	Yes	Our codebase is written in PyTorch (Paszke et al., 2019) and Python (Oliphant, 2007) and built on top of the pytorch-a2c-ppo-acktr codebase by Kostrikov (Kostrikov, 2018).
Experiment Setup	Yes	We ﬁrst brieﬂy describe our experimental setup and baseline choices, and then discuss our main experimental results. A full description of experiments is available in Appendix C. ... We threshold the aforementioned labeling functions into categorical labels (leaving real-valued labels for future work) and use (4) for style-consistency with Lstyle as the 0/1 loss. We use cross-entropy for Llabel and list all other hyperparameters in Appendix C.