Behavior Prior Representation learning for Offline Reinforcement Learning
Authors: Hongyu Zang, Xin Li, Jie Yu, Chen Liu, Riashat Islam, Remi Tachet des Combes, Romain Laroche
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that BPR combined with existing state-of-the-art Offline RL algorithms leads to significant improvements across several offline control benchmarks. The code is available at https://github.com/bit1029public/offline_bpr. |
| Researcher Affiliation | Collaboration | Hongyu Zang1, Xin Li1 , Jie Yu1, Chen Liu1, Riahsat Islam2, Rémi Tachet des Combes , Romain Laroche 1 Beijing Institute of Technology, China 2 Mila, Quebec AI Instiute, Canada {zanghyu,xinli,yujie,chenliu}@bit.edu.cn riashat.islam@mail.mcgill.ca {remi.tachet,romain.laroche}@gmail.com Work done while at Microsoft Research Montreal. |
| Pseudocode | Yes | We also provide the pseudocode of the pretraining process and the co-training process in Algorithm 1 and 2 in Appendix. |
| Open Source Code | Yes | The code is available at https://github.com/bit1029public/offline_bpr. |
| Open Datasets | Yes | We analyze our proposed method BPR on the D4RL benchmark (Fu et al., 2020) of Open AI gym Mu Jo Co tasks (Todorov et al., 2012) which includes a variety of datasets that have been commonly used in the Offline RL community. |
| Dataset Splits | No | The paper does not provide specific details on how the datasets (e.g., D4RL) are explicitly split into training, validation, and test sets by the authors themselves. It mentions evaluating models periodically during training, but this is not an explicit dataset split for validation purposes with specified percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as particular GPU models, CPU specifications, or memory sizes. It only mentions general terms like 'code platforms' (implying computational resources). |
| Software Dependencies | No | The paper mentions software platforms like 'Py Torch and Tensorflow' but does not specify their version numbers, which is necessary for reproducible software dependency information. |
| Experiment Setup | Yes | we first pretrain the encoder during 100k timesteps... Further details on the experiment setup are included in appendix G. In the experiment on d4rl tasks, all representation objectives use the same encoder architecture, i.e., with 4-layer MLP activated by Re LU, followed by another linear layer activated by Tanh, where the final output feature dimension of the encoder is 256. Besides, all representation objectives follow the same optimizer settings, pre-training data, and the number of pre-training epochs. |