reproducibilityindex.ai

Which Mutual-Information Representation Learning Objectives are Sufficient for Control?

Authors: Kate Rakelly, Abhishek Gupta, Carlos Florensa, Sergey Levine

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We corroborate our theoretical results with empirical experiments on a simulated game environment with visual observations.
Researcher Affiliation	Academia	Kate Rakelly Abhishek Gupta Carlos Florensa Sergey Levine University of California, Berkeley {rakelly, abhigupta, ﬂorensacc, svlevine}@eecs.berkeley.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper references 'garage: A toolkit for reproducible reinforcement learning research. https://github.com/rlworkgroup/garage, 2019.' but this is a general toolkit used by the authors, not a specific release of their own code for the methods described in this paper.
Open Datasets	No	The paper states: 'Our datasets consist of 50k transitions collected from a uniform random policy, which is sufﬁcient to cover the state space in our environments.' However, it does not provide any link, DOI, or formal citation for public access to this generated dataset.
Dataset Splits	No	The paper does not explicitly provide specific training/test/validation dataset splits, such as percentages or sample counts for each split.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments, only that they are deep RL experiments.
Software Dependencies	No	The paper mentions 'pygame [59]' and 'Soft Actor-Critic algorithm [25]' but does not specify version numbers for these or any other software libraries or dependencies.
Experiment Setup	Yes	To separate representation learning from RL, we ﬁrst optimize each representation learning objective on a dataset of ofﬂine data, similar to the protocol in Stooke et al. [64]. Our datasets consist of 50k transitions collected from a uniform random policy, which is sufﬁcient to cover the state space in our environments. We then freeze the weights of the state encoder learned in the ﬁrst phase and train RL agents with the representation as state input. ... For the RL algorithm, we use the Soft Actor-Critic algorithm [25], modiﬁed slightly for the discrete action distribution. Please see Appendix A.2 for full experimental details.