Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration
Authors: Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also conduct numerical experiments to verify our theory. Please see Appendix I for details. The results show that the proposed algorithm is not only provably efficient but also applicable. |
| Researcher Affiliation | Academia | Runzhe Wu Shanghai Jiao Tong University runzhe@sjtu.edu.cn Yufeng Zhang Northwestern University yufengzhang2023@u.northwestern.edu Zhuoran Yang Princeton University zy6@prince ton.edu Zhaoran Wang Northwestern University zhaoranwang@gmail.com |
| Pseudocode | Yes | Algorithm 1 Pessimistic planning. and Algorithm 2 Pessimistic Dual Iteration (PEDI). |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the described methodology. |
| Open Datasets | No | The paper refers to 'a dataset D = {(sτ h, aτ h, cτ h)}H,N h,τ=1 with N trajectories collected a priori by an experimentor' but does not provide specific access information like a link, DOI, or formal citation for this dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | No | The main text of the paper does not contain specific experimental setup details (concrete hyperparameter values, training configurations, or system-level settings). It refers to 'Appendix I for details' for numerical experiments, implying these details are not in the main body. |