Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Focus-Then-Reuse: Fast Adaptation in Visual Perturbation Environments

Authors: Jiahui Wang, Chao Chen, Jiacheng Xu, Zongzhang Zhang, Yang Yu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on challenging tasks based on Deep Mind Control Suite and Franka Emika Robotics demonstrate that FTR enables rapid adaptation in visual perturbation environments and achieves state-of-the-art performance.
Researcher Affiliation Academia 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2School of Artificial Intelligence, Nanjing University, Nanjing, China 3Nanyang Technological University, Singapore EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Focus-Then-Reuse
Open Source Code Yes The source code is available at https://github.com/LAMDA-RL/FTR.
Open Datasets Yes In this section, we present the experimental results of FTR on 11 tasks, including 8 tasks of Deep Mind Control Suite (DMC) [67] and 3 tasks of Franka Emika Robotics [68, 69]. ... To simulate real-world visual disturbances, we use five diverse videos from the DMC-Generalization Benchmark [17], as shown in Fig. 3.
Dataset Splits No Policies are trained using the Dr Q-v2 algorithm. We perform three independent runs and choose the policy with the best performance as the original policy πori. ... Generalization method (Sim GRL): Policies are trained in the source domain for 500k steps across three runs. ... Adaptation methods (FTR, PAD): ... adapted to each of the five target domains using three different seeds for 200k steps.
Hardware Specification Yes Most experiments are conducted on a server outfitted with 2 AMD EPYC 7542 32-Core Processor CPUs, 504GB of RAM, and 8 GPUs, each with a performance of over 35 TFLOPS, running Ubuntu 22.04.
Software Dependencies No The source code is available at https://github.com/LAMDA-RL/FTR. The code is modified from DMC-Generalization Benchmark [17] and FTD [10]. The PPO algorithm used in FTR is implemented based on https://iclr-blog-track.github.io/2022/03/25/ ppo-implementation-details/. The Dr Q-v2 algorithm is implemented based on https:// github.com/facebookresearch/drqv2. ... Segment Anything Model 2 (SAM 2) serves as the default segmentation model and tracking model in FTR. ... The default VLM used in FTR is Qwen-VL-Max [33].
Experiment Setup Yes Table 3: Hyperparameters. Hyperparameters of environments frame size 168 168 (franka-push, franka-door), 84 84 (otherwise) frame stack 3 episode length 200 (franka-push, franka-door), 1000 (otherwise) action repeat 2 (finger-spin, pendulum-swingup), 4 (otherwise) Hyperparameters of Dr Q-v2 train steps 5 10^5 replay buffer size 1 10^5 exploration steps 1 10^4 n-step returns 3 batch size 256 optimizer Adam actor & critic learning rate 1 10^-4 discount factor 0.99 critic Q-function soft-update rate τ 0.01 exploration stddev. clip 0.3 exploration stddev. schedule linear(1.0,0.1,100000) Hyperparameters of focus stage SAM 2 checkpoint sam2_hiera_tiny adapt steps 2 10^5 number of segments k 9 selection interval Tsel 20 SL-to-RL transition timestep T1 5000 transition end timestep T2 10000 policy stddev. σh 0.1 optimizer Adam batch size 128 learning rate 3 10^-4 clip ratio of PPO 0.2 discount factor 0.5 GAE lambda 0.95 LSL objective margin δ 0.1