Offline RL with Discrete Proxy Representations for Generalizability in POMDPs

Authors: Pengjie Gu, Xinyu Cai, Dong Xing, Xinrun Wang, Mengchen Zhao, Bo An

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to evaluate ORDER, showcasing its effectiveness in offline RL for diverse partially observable scenarios and highlighting the significance of discrete proxy representations in generalization performance. We conduct all experiments with five distinct random seeds, each consisting of ten separate runs.
Researcher Affiliation Collaboration Pengjie Gu 1, , Xinyu Cai 1, , Dong Xing2, Xinrun Wang 1, , Mengchen Zhao 3, , Bo An1 School of Computer Science and Engineering, Nanyang Technological University, Singapore1 College of Computer Science and Technology, Zhejiang University3 Noah s Ark Lab, Huawei3
Pseudocode No The paper describes the training stages and architectural components but does not provide a formal pseudocode block or algorithm figure.
Open Source Code No The paper does not include any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluate ORDER and other baseline algorithms on gym locomotion tasks and maze navigation tasks in the D4RL benchmark [12] under different partial observation situations.
Dataset Splits No The paper mentions using the D4RL benchmark and conducting experiments, but it does not specify the exact training, validation, and test dataset splits (e.g., percentages or sample counts) used for reproducibility.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper mentions using specific algorithms like IQL [28] and VQ-VAE [45], but it does not provide specific version numbers for these or any other software dependencies (e.g., Python, PyTorch, TensorFlow, etc.) that would be necessary for reproducibility.
Experiment Setup No The paper states: