Which Mutual-Information Representation Learning Objectives are Sufficient for Control?

Authors: Kate Rakelly, Abhishek Gupta, Carlos Florensa, Sergey Levine

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We corroborate our theoretical results with empirical experiments on a simulated game environment with visual observations.
Researcher Affiliation Academia Kate Rakelly Abhishek Gupta Carlos Florensa Sergey Levine University of California, Berkeley {rakelly, abhigupta, florensacc, svlevine}@eecs.berkeley.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper references 'garage: A toolkit for reproducible reinforcement learning research. https://github.com/rlworkgroup/garage, 2019.' but this is a general toolkit used by the authors, not a specific release of their own code for the methods described in this paper.
Open Datasets No The paper states: 'Our datasets consist of 50k transitions collected from a uniform random policy, which is sufficient to cover the state space in our environments.' However, it does not provide any link, DOI, or formal citation for public access to this generated dataset.
Dataset Splits No The paper does not explicitly provide specific training/test/validation dataset splits, such as percentages or sample counts for each split.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments, only that they are deep RL experiments.
Software Dependencies No The paper mentions 'pygame [59]' and 'Soft Actor-Critic algorithm [25]' but does not specify version numbers for these or any other software libraries or dependencies.
Experiment Setup Yes To separate representation learning from RL, we first optimize each representation learning objective on a dataset of offline data, similar to the protocol in Stooke et al. [64]. Our datasets consist of 50k transitions collected from a uniform random policy, which is sufficient to cover the state space in our environments. We then freeze the weights of the state encoder learned in the first phase and train RL agents with the representation as state input. ... For the RL algorithm, we use the Soft Actor-Critic algorithm [25], modified slightly for the discrete action distribution. Please see Appendix A.2 for full experimental details.