reproducibilityindex.ai

Scaling Pareto-Efficient Decision Making via Offline Multi-Objective RL

Authors: Baiting Zhu, Meihua Dang, Aditya Grover

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that PEDA closely approximates the behavioral policy on the D4MORL benchmark and provides an excellent approximation of the Pareto-front with appropriate conditioning, as measured by the hypervolume and sparsity metrics.
Researcher Affiliation	Academia	Baiting Zhu, Meihua Dang, Aditya Grover University of California, Los Angeles, CA, USA baitingzbt@g.ucla.edu, mhdang@cs.ucla.edu, adityag@cs.ucla.edu
Pseudocode	Yes	Algorithm 1 Data Collection in D4MORL
Open Source Code	Yes	Our code is available at: https://github.com/baitingzbt/PEDA.
Open Datasets	Yes	We introduce Datasets for Multi-Objective Reinforcement Learning (D4MORL), a large-scale benchmark for offline MORL. Our benchmark consists of offline trajectories from 6 multiobjective Mu Jo Co environments including 5 environments with 2 objectives each (MO-Ant, MOHalf Cheetah, MO-Hopper, MO-Swimmer, MO-Walker2d), and one environment with three objectives (MO-Hopper-3obj). [...] Further details are described in Appendix C.
Dataset Splits	No	The paper describes splitting preferences for evaluation and mentions collecting 50K trajectories for each setting but does not specify a training/validation/test split for the dataset itself in a reproducible manner. It states: "For every environment in D4MORL, we collect 50K trajectories of length T 500 for both expert and amateur trajectory distributions under each of the 3 preference distributions."
Hardware Specification	No	The paper does not mention any specific hardware (GPU, CPU, etc.) used for running experiments.
Software Dependencies	No	The paper mentions using 'GPT (Radford et al., 2019)' and 'Scipy (Vasicek, 1976, Virtanen et al., 2020)' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	In this section, we list our hyper-parameters and model details. In specific, we use the same hyperparameters for all algorithms, except for the learning rate scheduler and warm-up steps. [...] Hyperparameter MODT MORv S BC Context Length K 20 1 20 Batch Size 64 Hidden Size 512 Learning Rate 1e-4 Weight Decay 1e-3 Dropout 0.1 n layer 3 Optimizer Adam W Loss Function MSE LR Scheduler lambda None lambda Warm-up Steps 10000 N/A 4000 Activation Re LU