reproducibilityindex.ai

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch

Authors: Malek Mechergui, Sarath Sreedharan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on a set of standard Markov Decision Process (MDP) benchmarks. and We empirically demonstrate how the method compares against baseline methods for handling reward uncertainty in benchmark domains.
Researcher Affiliation	Academia	Malek Mechergui, Sarath Sreedharan Colorado State University Fort Collins, 80523 {Malek.Mechergui, Sarath.Sreedharan}@colostate.edu
Pseudocode	Yes	Algorithm 1 provides the pseudo-code for the query procedure.
Open Source Code	Yes	The code for our experiments can be found at https://github.com/Malek-Mechergui/codeMDP and We have included a zip of the code along with instructions. There was no dataset. and We wrote all the codes included, and they will be released with an open-source license.
Open Datasets	Yes	Most of these are standard benchmark tasks taken from the Simple RL library [Abel, 2019].
Dataset Splits	No	The paper does not explicitly provide training, validation, or test dataset splits in terms of percentages or sample counts. It describes using 'five random instantiations of each grid size' for evaluation.
Hardware Specification	Yes	All experiments were run on Alma Linux 8.9 with 32GB RAM and 16 Intel(R) Xeon(R) 2.60GHz CPUs.
Software Dependencies	No	We used CPLEX [Bliek1ú et al., 2014] as our LP solver (no-cost edition)4. The paper mentions CPLEX but does not provide a specific version number.
Experiment Setup	Yes	All the baselines were run with a time-bound of 30 minutes per problem. and We have specified the solver used. Given these are just using LP formulations of MDPs we didn t have any hyperparameters to select. and For each of the tasks, the expectation set consists of reaching the goal state and avoiding some random states in the environment. The human models were generated by modifying the original task slightly.