Towards Principled Representation Learning from Videos for Reinforcement Learning

Authors: Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically test our theoretical results in three visual domains, yielding results that are consistent with our theoretical findings.
Researcher Affiliation Industry Dipendra Misra1 Akanksha Saran2 Tengyang Xie1 Alex Lamb1 John Langford1 1Microsoft Research, NY 2Sony Research, CA
Pseudocode No The paper describes methods textually and mathematically in sections like "3 REPRESENTATION LEARNING FOR RL USING VIDEO DATASET" and "B PROOFS OF THEORETICAL STATEMENTS", but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code for all experiments is available as part of the Intrepid codebase at https://github.com/microsoft/Intrepid.
Open Datasets Yes We empirically test our theoretical results in three visual domains: Grid World (a navigation domain), Vi ZDoom basic (a first-person 3D shooting game), and Vi ZDoom Defend The Center (a more challenging first-person 3D shooting game).
Dataset Splits No The paper discusses concepts of training, validation, and testing phases for machine learning models and experiments in general. However, it does not provide specific details on how the datasets used in their experiments (Grid World, Vi ZDoom) were split into training, validation, and test sets (e.g., percentages, absolute counts, or specific predefined split citations).
Hardware Specification Yes All the code for this work was run on A100, V100, P40 GPUs, with a compute time of approx. 12 hours for grid world experiments and 6 hours for Vi ZDoom experiments.
Software Dependencies No The paper mentions software components like 'PPO' (Proximal Policy Optimization) and uses environments such as 'Minigrid' and 'Vi ZDoom'. However, it does not provide specific version numbers for these or any other ancillary software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be necessary for replication.
Experiment Setup Yes In Table 2, we report the hyperaparameter values used for experiments in this work with the Grid World and Vi ZDoom environments.