Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reinforcement Learning with Neural Radiance Fields
Authors: Danny Driess, Ingmar Schubert, Pete Florence, Yunzhu Li, Marc Toussaint
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments indicate that Ne RF as supervision leads to a latent space better suited for the downstream RL tasks involving robotic object manipulations like hanging mugs on hooks, pushing objects, or opening doors. |
| Researcher Affiliation | Collaboration | Danny Driess TU Berlin Ingmar Schubert TU Berlin Pete Florence Google Yunzhu Li MIT Marc Toussaint TU Berlin |
| Pseudocode | No | The paper describes its methods in prose and with mathematical formulas, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See website. |
| Open Datasets | No | The paper states that the environments are custom and data is collected by random interactions, rather than using a publicly available or open dataset with access information provided. |
| Dataset Splits | Yes | Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] See appendix. |
| Hardware Specification | Yes | Additionally of particular relevance, various methods have developed latent-conditioned [42, 43, 44] or compositional/object-oriented approaches for Ne RFs [45, 46, 47, 48, 49, 50, 51, 52, 53], although they, nor other Ne RF-style methods to our knowledge, have been applied to RL. In our case, we are not constrained by inference-time computation issues, since we do not need to render images, and only have to run our latent-space encoder (with a runtime of approx. 7 ms on an RTX3090). |
| Software Dependencies | No | The paper mentions using PPO as the RL algorithm and references Stable Baselines3, but it does not specify explicit version numbers for any software dependencies. |
| Experiment Setup | Yes | We use PPO [86] as the RL algorithm and four camera views in all experiments. Refer to the appendix for more details about our environments, parameter choices, network architectures, and training times. |