Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Diffusion Guided Adversarial State Perturbations in Reinforcement Learning

Authors: Xiaolin Sun, Feidi Liu, Zhengming Ding, Zizhan Zheng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations show that our attack effectively breaks existing defenses, including the most sophisticated ones, significantly outperforming existing attacks while being more perceptually stealthy. The results highlight the vulnerability of RL agents to semantics-aware adversarial perturbations, indicating the importance of developing more robust policies. Our code can be found at this Git Hub Repo. ... Comprehensive evaluations show that it can break all known defenses, lower agents cumulative reward in various environments by more than 50%, while being stealthier than prior attacks, as shown by lower reconstruction error, Wasserstein-1 distance, and LPIPS [62], and higher SSIM [50]. ... We evaluate SHIFT using four Atari environments [7], Doom game [52] and Airsim [43] autonomous driving simulator.
Researcher Affiliation Academia Xiaolin Sun1, Feidi Liu2, Zhengming Ding1, and Zizhan Zheng1 1Department of Computer Science, Tulane University 2Shanghai Center for Mathematical Science, Fudan University EMAIL, EMAIL
Pseudocode Yes C.6 Training and testing stage algorithms for SHIFT Algorithm 1: History-Aligned Conditional Diffusion Model Training Algorithm 2: Testing Stage Sampling with History, Policy and Realism Guidance
Open Source Code Yes Our code can be found at this Git Hub Repo.
Open Datasets Yes We evaluate SHIFT using four Atari environments [7], Doom game [52] and Airsim [43] autonomous driving simulator.
Dataset Splits No The paper does not provide specific training/test/validation dataset splits. It mentions running experiments and reporting results with mean and standard deviation over 10 runs, but not how underlying environment data is split for training vs. evaluation.
Hardware Specification Yes We conduct all our experiments on a workstation equipped with an Intel I9-12900KF CPU, an RTX 3090 GPU, and 64GB system RAM.
Software Dependencies No The paper mentions using Python and deep learning frameworks implicitly but does not provide specific version numbers for software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We set the history length k = 4 for the two history-based defenses and our attacks. All other hyper-parameters of are given in Appendix C.7. ... We schedule the classifier-free guidance scale as Γ1(i) = max( T i / T , 0.3), where T is the number of reverse steps and i is the current reverse step. We set the policy guidance strength Γ2 differently in each environment under each defense. In the Pong environment, we set Γ2 = 3.5 for DQN, DP-DQN, and DMBP and Γ2 = 2 for all other defenses. In the Freeway environment, we set Γ2 = 6 for DQN, DP-DQN and DMBP and Γ2 = 4.5 for all other defenses. In the Bank Heist environment, we set Γ2 = 4 for all defenses. In the Road Runner environment, we set Γ2 = 6 for all defenses. In the Doom environment, we set Γ2 = 4.5 for DQN, DP-DQN, and DMBP and Γ2 = 2.5 for all other defenses. For the Air Sim autonomous driving simulator, we utilize CITY, a complex environment that simulates real-world city traffic situations. We set Γ2 = 0.5 for all defenses in Air Sim.