Goal-Conditioned Predictive Coding for Offline Reinforcement Learning
Authors: Zilai Zeng, Ce Zhang, Shijie Wang, Chen Sun
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive empirical evaluations on Ant Maze, Franka Kitchen and Locomotion environments, we observe that sequence modeling can have a significant impact on challenging decision making tasks. |
| Researcher Affiliation | Academia | Zilai Zeng Brown University Ce Zhang Brown University Shijie Wang Brown University Chen Sun Brown University |
| Pseudocode | Yes | Appendix B Pseudocode of GCPC. Algorithm 1 Goal-conditioned Predictive Coding (GCPC) for Rv S |
| Open Source Code | Yes | Our code is available at https://brown-palm.github.io/GCPC/. |
| Open Datasets | Yes | To answer the questions above, we conduct extensive experiments on three domains from D4RL offline benchmark suite [16]: Ant Maze, Franka Kitchen and Gym Locomotion. |
| Dataset Splits | Yes | During policy learning, we use the pre-trained Traj Net that achieves the lowest validation reconstruction loss to generate the bottleneck. |
| Hardware Specification | Yes | All experiments are performed on a single Nvidia RTX A5000. |
| Software Dependencies | No | The paper mentions using Adam optimizer but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions). |
| Experiment Setup | Yes | Appendix A.3 Hyperparameter. We list hyperparameters for BC and Rv S-G/R replication in Table A1 and GCPC implementation in Table A2. For Rv S-G/R and Policy Net in GCPC, we use a two-layer feedforward MLP as the policy network, taking the current state and goal (state or return-to-go) as input, the only difference is that Policy Net takes the bottleneck as an additional input. All experiments are performed on a single Nvidia RTX A5000. |