Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Real-World Reinforcement Learning of Active Perception Behaviors
Authors: Edward Hu, Jie Wang, Xingfang Yuan, Fiona Luo, Muyao Li, Gaspard Lambrechts, Oleh Rybkin, Dinesh Jayaraman
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In evaluations on 8 manipulation tasks on 3 robots spanning varying degrees of partial observability, AAWR synthesizes reliable active perception behaviors that outperform all prior approaches. |
| Researcher Affiliation | Academia | 1University of Pennsylvania 2University of Liège 3UC Berkeley |
| Pseudocode | Yes | Algorithm 1 AAWR Offline-to-Online Training Algorithm 2 Deployment |
| Open Source Code | No | We will release code for the algorithm and environments. |
| Open Datasets | Yes | We used the DROID robot setup[56], which consists of a 7 Do F Franka Emika Panda Robot Arm, a Robotiq 2F-85 parallel-jaw gripper, a wrist-mounted ZED Mini RGB-D camera and two side-mounted ZED 2 stereo cameras. The DROID set-up enables the usage of the generalist VLA policy Ο0 [57], specifically the FAST-DROID checkpoint. |
| Dataset Splits | No | We initially collect up to 250 demonstrations per task, but then we curate the dataset, dropping out trajectories with mislabeled object detections, noisy/faulty sensor readings, etc. After filtering, we end up with 152 demonstrations for Bookshelf-P, 109 for Bookshelf-D, 35 for Shelf-Cabinet, and 195 for Complex. ... We sample an equal number of transitions from both buffers to form a batch, following best practice from prior work [47, 51]. |
| Hardware Specification | No | It used computing resources from the National Artificial Intelligence Research Resource Pilot (NAIRR 240077). |
| Software Dependencies | No | The wrist image is first fed into a frozen DINO-V2[58] encoder (Vi T-S14) ... To obtain object detection and segmentation of the target object, we used the DINO-X [59] API and the Grounded SAM [60] Model for Open-World Object Detection and segmentation. ... We query the Gemini-2.5-Flash [53] model with a task prompt template... |
| Experiment Setup | Yes | We train all models with a batch size of 256, learning rate of 0.0001, and the Adam optimizer. For online finetuning following [47], we use an update-to-date ratio of 1 , performing gradient updates after every episode. For AWR and AAWR, we use an advantage temperature of 10. |