Behavior Alignment via Reward Function Optimization
Authors: Dhawal Gupta, Yash Chandak, Scott Jordan, Philip S. Thomas, Bruno C. da Silva
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method s efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges. ... 6 Empirical Analyses ... Table 1: Summary of the performance of various reward combination methods and types of raux |
| Researcher Affiliation | Academia | Dhawal Gupta University of Massachusetts Yash Chandak Stanford University Scott M. Jordan University of Alberta Philip S. Thomas University of Massachusetts Bruno Castro da Silva University of Massachusetts |
| Pseudocode | Yes | C.4 Pseudo Code (Algorithm 5) ... Algorithm 5: BARFI: Behavior Alignment Reward Function s Implicit optimization |
| Open Source Code | No | The paper does not provide an explicit statement or a link to its own open-source code for the described methodology. |
| Open Datasets | Yes | Mountain Car (MC) [58], ... Cart Pole (CP) [16] ... Half Cheetah-v4 from Mujoco (MJ) suite of Open AI Gym [9] |
| Dataset Splits | No | The paper describes data collection via agent interaction and uses 'batches of trajectories' for policy updates, but it does not specify explicit training, validation, and testing *dataset splits* in the traditional sense, as is common in offline learning. |
| Hardware Specification | Yes | The computer is used for a cluster where the CPU class is Intel Xeon Gold 6240 CPU @2.60GHz. |
| Software Dependencies | No | The paper mentions using 'Py Torch [46]' and 'Open AI Gym [9]', but does not provide explicit version numbers for these or other software dependencies used in their implementation. |
| Experiment Setup | Yes | Table 3: Hyper-parameters for Grid World, Table 4: Hyper-parameters for Mountain Car, Table 5: Hyper-parameters for Cart Pole, Table 6: Hyper-parameters for Mu Joco |