reproducibilityindex.ai

Behavior Alignment via Reward Function Optimization

Authors: Dhawal Gupta, Yash Chandak, Scott Jordan, Philip S. Thomas, Bruno C. da Silva

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method s efﬁcacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges. ... 6 Empirical Analyses ... Table 1: Summary of the performance of various reward combination methods and types of raux
Researcher Affiliation	Academia	Dhawal Gupta University of Massachusetts Yash Chandak Stanford University Scott M. Jordan University of Alberta Philip S. Thomas University of Massachusetts Bruno Castro da Silva University of Massachusetts
Pseudocode	Yes	C.4 Pseudo Code (Algorithm 5) ... Algorithm 5: BARFI: Behavior Alignment Reward Function s Implicit optimization
Open Source Code	No	The paper does not provide an explicit statement or a link to its own open-source code for the described methodology.
Open Datasets	Yes	Mountain Car (MC) [58], ... Cart Pole (CP) [16] ... Half Cheetah-v4 from Mujoco (MJ) suite of Open AI Gym [9]
Dataset Splits	No	The paper describes data collection via agent interaction and uses 'batches of trajectories' for policy updates, but it does not specify explicit training, validation, and testing dataset splits in the traditional sense, as is common in offline learning.
Hardware Specification	Yes	The computer is used for a cluster where the CPU class is Intel Xeon Gold 6240 CPU @2.60GHz.
Software Dependencies	No	The paper mentions using 'Py Torch [46]' and 'Open AI Gym [9]', but does not provide explicit version numbers for these or other software dependencies used in their implementation.
Experiment Setup	Yes	Table 3: Hyper-parameters for Grid World, Table 4: Hyper-parameters for Mountain Car, Table 5: Hyper-parameters for Cart Pole, Table 6: Hyper-parameters for Mu Joco