reproducibilityindex.ai

Behavior Prior Representation learning for Offline Reinforcement Learning

Authors: Hongyu Zang, Xin Li, Jie Yu, Chen Liu, Riashat Islam, Remi Tachet des Combes, Romain Laroche

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks. The code is available at https://github.com/bit1029public/offline_bpr.
Researcher Affiliation	Collaboration	Hongyu Zang1, Xin Li1 , Jie Yu1, Chen Liu1, Riahsat Islam2, Rémi Tachet des Combes , Romain Laroche 1 Beijing Institute of Technology, China 2 Mila, Quebec AI Instiute, Canada {zanghyu,xinli,yujie,chenliu}@bit.edu.cn riashat.islam@mail.mcgill.ca {remi.tachet,romain.laroche}@gmail.com Work done while at Microsoft Research Montreal.
Pseudocode	Yes	We also provide the pseudocode of the pretraining process and the co-training process in Algorithm 1 and 2 in Appendix.
Open Source Code	Yes	The code is available at https://github.com/bit1029public/offline_bpr.
Open Datasets	Yes	We analyze our proposed method BPR on the D4RL benchmark (Fu et al., 2020) of Open AI gym Mu Jo Co tasks (Todorov et al., 2012) which includes a variety of datasets that have been commonly used in the Offline RL community.
Dataset Splits	No	The paper does not provide specific details on how the datasets (e.g., D4RL) are explicitly split into training, validation, and test sets by the authors themselves. It mentions evaluating models periodically during training, but this is not an explicit dataset split for validation purposes with specified percentages or sample counts.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as particular GPU models, CPU specifications, or memory sizes. It only mentions general terms like 'code platforms' (implying computational resources).
Software Dependencies	No	The paper mentions software platforms like 'Py Torch and Tensorflow' but does not specify their version numbers, which is necessary for reproducible software dependency information.
Experiment Setup	Yes	we first pretrain the encoder during 100k timesteps... Further details on the experiment setup are included in appendix G. In the experiment on d4rl tasks, all representation objectives use the same encoder architecture, i.e., with 4-layer MLP activated by Re LU, followed by another linear layer activated by Tanh, where the final output feature dimension of the encoder is 256. Besides, all representation objectives follow the same optimizer settings, pre-training data, and the number of pre-training epochs.