Reinforcement Learning with Neural Radiance Fields

Authors: Danny Driess, Ingmar Schubert, Pete Florence, Yunzhu Li, Marc Toussaint

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments indicate that Ne RF as supervision leads to a latent space better suited for the downstream RL tasks involving robotic object manipulations like hanging mugs on hooks, pushing objects, or opening doors.
Researcher Affiliation Collaboration Danny Driess TU Berlin Ingmar Schubert TU Berlin Pete Florence Google Yunzhu Li MIT Marc Toussaint TU Berlin
Pseudocode No The paper describes its methods in prose and with mathematical formulas, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See website.
Open Datasets No The paper states that the environments are custom and data is collected by random interactions, rather than using a publicly available or open dataset with access information provided.
Dataset Splits Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See appendix.
Hardware Specification Yes Additionally of particular relevance, various methods have developed latent-conditioned [42, 43, 44] or compositional/object-oriented approaches for Ne RFs [45, 46, 47, 48, 49, 50, 51, 52, 53], although they, nor other Ne RF-style methods to our knowledge, have been applied to RL. In our case, we are not constrained by inference-time computation issues, since we do not need to render images, and only have to run our latent-space encoder (with a runtime of approx. 7 ms on an RTX3090).
Software Dependencies No The paper mentions using PPO as the RL algorithm and references Stable Baselines3, but it does not specify explicit version numbers for any software dependencies.
Experiment Setup Yes We use PPO [86] as the RL algorithm and four camera views in all experiments. Refer to the appendix for more details about our environments, parameter choices, network architectures, and training times.