reproducibilityindex.ai

Multi-Objective Reinforcement Learning: Convexity, Stationarity and Pareto Optimality

Authors: Haoye Lu, Daniel Herman, Yaoliang Yu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	our algorithm achieves state-of-the-art performance on multiple Mu Jo Co tasks in the preference agnostic setting. Furthermore, we empirically show that, in contrast to other LS-based algorithms, our approach is signiﬁcantly more stable, achieving similar results across various random seeds. We test our algorithm over a multi-objective version of the Mu Jo Co environment. Fig 8 plots the methods trajectories on four Mu Jo Co benchmarks. We train each method ﬁve times with various random seeds and report the mean and standard deviation.
Researcher Affiliation	Academia	Haoye Lu, Daniel Herman & Yaoliang Yu School of Computer Science, University of Waterloo Vector Institute {haoye.lu,d2herman,yaoliang.yu}@uwaterloo.ca
Pseudocode	Yes	Algorithm 1: The CAPQL implementation
Open Source Code	Yes	The source code of our CAPQL implementation is available online: https://github.com/haoyelu/CAPQL.git.
Open Datasets	Yes	We test our algorithm over a multi-objective version of the Mu Jo Co environment. The reward vector was created by simply exposing the individual components that went into the regular scalar reward: adding them up recovers the default scalar reward. (See Appx I-Table 4 for details.)
Dataset Splits	No	The paper describes training and evaluation within the MuJoCo environments but does not specify dataset splits (e.g., percentages or sample counts) for training, validation, and testing as might be found with static datasets. The evaluation during training is done on 'randomly sampled weights'.
Hardware Specification	No	The paper mentions 'Training was done using pytorch-1.12.1 and NVIDIA s CUDA 11.6,' which implies NVIDIA GPUs, but does not specify exact GPU models, CPU models, or detailed hardware configurations used for experiments.
Software Dependencies	Yes	Python 3.10.4 was used as the primary programming language. We accessed Mu Jo Co210 through gym-0.21.0 s wrapper classes. Training was done using pytorch-1.12.1 and NVIDIA s CUDA 11.6.
Experiment Setup	Yes	Table 1: Hyperparameters of CAPQL and QEnv-ctn (Optimizer Adam, learning rate 3e-4, discount factor (γ) 0.99, hidden dim (for all networks) 256, replay buffer size 10^6, minibatch size 256, nonlinearity ReLU, target smoothing coefficient (τ) 0.005). Table 2: Augmentation strength of CAPQL (Environment, α).