Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Behavior Alignment via Reward Function Optimization
Authors: Dhawal Gupta, Yash Chandak, Scott Jordan, Philip S. Thomas, Bruno C. da Silva
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method s efficacy on a diverse set of tasks, from small-scale experiments to high-dimensional control challenges. ... 6 Empirical Analyses ... Table 1: Summary of the performance of various reward combination methods and types of raux |
| Researcher Affiliation | Academia | Dhawal Gupta University of Massachusetts Yash Chandak Stanford University Scott M. Jordan University of Alberta Philip S. Thomas University of Massachusetts Bruno Castro da Silva University of Massachusetts |
| Pseudocode | Yes | C.4 Pseudo Code (Algorithm 5) ... Algorithm 5: BARFI: Behavior Alignment Reward Function s Implicit optimization |
| Open Source Code | No | The paper does not provide an explicit statement or a link to its own open-source code for the described methodology. |
| Open Datasets | Yes | Mountain Car (MC) [58], ... Cart Pole (CP) [16] ... Half Cheetah-v4 from Mujoco (MJ) suite of Open AI Gym [9] |
| Dataset Splits | No | The paper describes data collection via agent interaction and uses 'batches of trajectories' for policy updates, but it does not specify explicit training, validation, and testing *dataset splits* in the traditional sense, as is common in offline learning. |
| Hardware Specification | Yes | The computer is used for a cluster where the CPU class is Intel Xeon Gold 6240 CPU @2.60GHz. |
| Software Dependencies | No | The paper mentions using 'Py Torch [46]' and 'Open AI Gym [9]', but does not provide explicit version numbers for these or other software dependencies used in their implementation. |
| Experiment Setup | Yes | Table 3: Hyper-parameters for Grid World, Table 4: Hyper-parameters for Mountain Car, Table 5: Hyper-parameters for Cart Pole, Table 6: Hyper-parameters for Mu Joco |