Reinforcement Learning with Neural Radiance Fields
Authors: Danny Driess, Ingmar Schubert, Pete Florence, Yunzhu Li, Marc Toussaint
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments indicate that Ne RF as supervision leads to a latent space better suited for the downstream RL tasks involving robotic object manipulations like hanging mugs on hooks, pushing objects, or opening doors. |
| Researcher Affiliation | Collaboration | Danny Driess TU Berlin Ingmar Schubert TU Berlin Pete Florence Google Yunzhu Li MIT Marc Toussaint TU Berlin |
| Pseudocode | No | The paper describes its methods in prose and with mathematical formulas, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See website. |
| Open Datasets | No | The paper states that the environments are custom and data is collected by random interactions, rather than using a publicly available or open dataset with access information provided. |
| Dataset Splits | Yes | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See appendix. |
| Hardware Specification | Yes | Additionally of particular relevance, various methods have developed latent-conditioned [42, 43, 44] or compositional/object-oriented approaches for Ne RFs [45, 46, 47, 48, 49, 50, 51, 52, 53], although they, nor other Ne RF-style methods to our knowledge, have been applied to RL. In our case, we are not constrained by inference-time computation issues, since we do not need to render images, and only have to run our latent-space encoder (with a runtime of approx. 7 ms on an RTX3090). |
| Software Dependencies | No | The paper mentions using PPO as the RL algorithm and references Stable Baselines3, but it does not specify explicit version numbers for any software dependencies. |
| Experiment Setup | Yes | We use PPO [86] as the RL algorithm and four camera views in all experiments. Refer to the appendix for more details about our environments, parameter choices, network architectures, and training times. |