Lexicographic Multi-Objective Reinforcement Learning
Authors: Joar Skalse, Lewis Hammond, Charlie Griffin, Alessandro Abate
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we evaluate our algorithms empirically. We first show how the learning time of LRL scales with the number of reward functions. We then compare the performance of VB-LRL and PB-LRL against that of other algorithms for solving constrained RL problems. Further experimental details and additional experiments are described in the supplementary material, and documented in our codebase. |
| Researcher Affiliation | Academia | Department of Computer Science, University of Oxford {joar.skalse, lewis.hammond, charlie.griffin, aabate}@cs.ox.ac.uk |
| Pseudocode | Yes | Algorithm 1 Lexicographic ϵ-Greedy, Algorithm 2 Value-Based Lexicographic RL, Algorithm 3 Policy-Based Lexicographic RL |
| Open Source Code | Yes | Further experimental details and additional experiments are described in the supplementary material, and documented in our codebase.5 Available at https://github.com/lrhammond/lmorl. |
| Open Datasets | Yes | The Cart Safe environment from gym-safety6 is a version of the classic Cart Pole environment.6 Available at https://github.com/jemaw/gym-safety. The Grid Nav environment, again from gym-safety (based on an environment in [Chow et al., 2018]), is a large gridworld... Finally, in the Intersection environment from highway-env7 the agent must guide a car through an intersection with dense traffic.7 Available at https://github.com/eleurent/highway-env |
| Dataset Splits | No | The paper mentions training models and evaluating performance but does not specify any explicit train/validation/test dataset splits or their percentages/counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU/CPU models, processor types, or memory amounts. |
| Software Dependencies | No | The paper mentions software environments like 'gym-safety' and 'highway-env' but does not specify any version numbers for these or other software dependencies. |
| Experiment Setup | No | The paper provides some high-level experimental details such as environment names and the number of runs, but it does not include concrete hyperparameter values, specific training configurations, or system-level settings within the main text. It defers further details to supplementary material. |