Towards Principled Representation Learning from Videos for Reinforcement Learning
Authors: Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically test our theoretical results in three visual domains, yielding results that are consistent with our theoretical findings. |
| Researcher Affiliation | Industry | Dipendra Misra1 Akanksha Saran2 Tengyang Xie1 Alex Lamb1 John Langford1 1Microsoft Research, NY 2Sony Research, CA |
| Pseudocode | No | The paper describes methods textually and mathematically in sections like "3 REPRESENTATION LEARNING FOR RL USING VIDEO DATASET" and "B PROOFS OF THEORETICAL STATEMENTS", but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code for all experiments is available as part of the Intrepid codebase at https://github.com/microsoft/Intrepid. |
| Open Datasets | Yes | We empirically test our theoretical results in three visual domains: Grid World (a navigation domain), Vi ZDoom basic (a first-person 3D shooting game), and Vi ZDoom Defend The Center (a more challenging first-person 3D shooting game). |
| Dataset Splits | No | The paper discusses concepts of training, validation, and testing phases for machine learning models and experiments in general. However, it does not provide specific details on how the datasets used in their experiments (Grid World, Vi ZDoom) were split into training, validation, and test sets (e.g., percentages, absolute counts, or specific predefined split citations). |
| Hardware Specification | Yes | All the code for this work was run on A100, V100, P40 GPUs, with a compute time of approx. 12 hours for grid world experiments and 6 hours for Vi ZDoom experiments. |
| Software Dependencies | No | The paper mentions software components like 'PPO' (Proximal Policy Optimization) and uses environments such as 'Minigrid' and 'Vi ZDoom'. However, it does not provide specific version numbers for these or any other ancillary software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA versions) that would be necessary for replication. |
| Experiment Setup | Yes | In Table 2, we report the hyperaparameter values used for experiments in this work with the Grid World and Vi ZDoom environments. |