reproducibilityindex.ai

An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning

Authors: Qian Lin, Zongkai Liu, Danying Mo, Chao Yu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on offline multi-objective and safe tasks demonstrate the capability of our framework to infer policies that align with real preferences while meeting the constraints implied by the provided demonstrations.
Researcher Affiliation	Collaboration	1Sun Yat-sen University, Guangzhou, China 2Pengcheng Laboratory, Shenzhen, China 3Mo E Key Laboratory of Information Technology, Guangzhou, China
Pseudocode	Yes	Algorithm 1 Preference Distribution Offline Adaptation
Open Source Code	Yes	Codes and instructions are provided in supplemental material to generate the dataset used and reproduce the main results in the paper.
Open Datasets	Yes	We utilize the D4MORL dataset [Zhu et al., 2023] collected from multi-objective Mu Jo Co environments... We utilize the datasets in DSRL benchmark [Liu et al., 2023b] that are collected by a set of behavior policies trained under various safe thresholds.
Dataset Splits	No	The unselected trajectories in datasets constitute the training set.
Hardware Specification	Yes	The training and testing were conducted on 1 NVIDIA Ge Force RTX 3090 GPU.
Software Dependencies	No	For MODF, we use the original implementation in https: //github.com/qianlin04/PRMORL.
Experiment Setup	Yes	The weight η of the regularization term ( µ 1 1)2 in Eq. (7) is set to 1.0. For each target, the number of gradient updates is set to 1000, with 64 preferences sampled from the distribution for each gradient update. All samples in the demonstration set are used for gradient updates within one batch. We use the Adam optimizer with a learning rate of 0.05. The conservatism weight α in Eq. (10) is set to 1.0 for MORL tasks and 0.7 for safe RL and CMORL tasks. The weight of the TD reward in Eq. (6) is set to 0.01 for PDOA [MODF].