Contrastive Difference Predictive Coding

Authors: Chongyi Zheng, Ruslan Salakhutdinov, Benjamin Eysenbach

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that, compared with prior RL methods, ours achieves 2 median improvement in success rates and can better cope with stochastic environments. In tabular settings, we show that our method is about 20 more sample efficient than the successor representation and 1500 more sample efficient than the standard (Monte Carlo) version of contrastive predictive coding.
Researcher Affiliation Academia Chongyi Zheng Carnegie Mellon University chongyiz@andrew.cmu.edu Ruslan Salakhutdinov Carnegie Mellon University Benjamin Eysenbach Princeton University
Pseudocode Yes Algorithm 1 Temporal Difference Info NCE. We use CE to denote the cross entropy loss, taken across the rows of a matrix of logits and labels. We use F as a matrix of logits, where F[i, j] = ϕ(s(i) t , a(i) t , g(i)) ψ(s(j) t+). See Appendix D.1 for details.
Open Source Code Yes Code: https://github.com/chongyi-zheng/td_infonce Website: https://chongyi-zheng.github.io/td_infonce
Open Datasets Yes We compare TD Info NCE to four baselines on an online GCRL benchmark (Plappert et al., 2018) containing four manipulation tasks for the Fetch robot. We evaluate on Ant Maze tasks from the D4RL benchmark (Fu et al., 2020).
Dataset Splits No The paper uses standard benchmarks like Fetch Robotics and D4RL which typically have predefined splits, but it does not explicitly state the train/validation/test percentages or sample counts within the text, nor does it cite a specific source for these splits beyond the benchmark itself.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments.
Software Dependencies No The paper mentions using JAX (Bradbury et al., 2018) but does not provide specific version numbers for JAX or any other software dependencies like Python or PyTorch libraries.
Experiment Setup Yes We summarize hyperparameters for TD Info NCE in Table 2. Hyperparameters: actor learning rate 5 10 5, critic learning rate 3 10 4, using ℓ2 normalized representations yes, hidden layers sizes (for both actor and representations) (512, 512, 512, 512), contrastive representation dimensions 16. For offline RL experiments, we make some changes to hyperparameters (Table 3).