Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Bilevel Optimization for Adversarial Learning Problems: Sharpness, Generation, and Beyond

Authors: Risheng Liu, Zhu Liu, Weihao Mao, Wei Yao, Jin Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that our method improves generation quality of GANs, and consistently achieves higher accuracy for SAM under label noise and across various backbones, while promoting ﬂatter loss landscapes. Overall, this work provides a practical and theoretically grounded framework for solving adversarial learning tasks through bilevel optimization.
Researcher Affiliation	Collaboration	Risheng Liu , Zhu Liu , Weihao Mao , Wei Yao , Jin Zhang School of Software Technology, Dalian University of Technology Mathematical Department, Southern University of Science and Technology National Center for Applied Mathematics Shenzhen Detection Institute for Advanced Technology Longhua-Shenzhen (DIATLHSZ) EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm for SAM In particular, we consider the SAM problem given in (3). For a ﬁxed pair (ω, δ), the Moreau envelope reformulates the lower-level problem as the following smooth optimization problem: min θ C Lℓ(ω, θ) + 1 2γ θ δ 2, (19) which is typically convex in θ. In this case, any Karush-Kuhn-Tucker (KKT) point corresponds to a global minimizer. The global minimizer θ γ(ω, δ) of (19) satisﬁes the optimality condition: 0 δLℓ(ω, θ ) + 1 γ (θ δ) + N(θ , C),
Open Source Code	Yes	The source codes will be released at https://github.com/Liu Zhu-CV/BLOAL.
Open Datasets	Yes	We conduct comparison with Stacked MNIST, a challenging dataset with 1000 modes and twodimensional simulation experiments based on Gaussian distribution, generating eight distribution of 2D wheels. ... We conduct image classiﬁcation experiments using the standard open-source CIFAR-10 benchmark
Dataset Splits	Yes	We conduct image classiﬁcation experiments using the standard open-source CIFAR-10 benchmark, which consists of 50,000 training and 10,000 testing image-label pairs.
Hardware Specification	Yes	We conducted the experiments on a PC with Intel i5-13600KF CPU (3.5 GHz), 32GB RAM and NVIDIA RTX 4090 GPU.
Software Dependencies	No	We leveraged the Py Torch framework on the 64-bit Linux system.
Experiment Setup	Yes	As for the ﬁrst case, we set η, β, α, γ, µ, and p as 0.001, 0.01, 0.0001, 20, and 0.1 and leverage ω ω 1e 4 as the stop criterion. SGD optimizer is used for the update of ω. We set the the maximum steps of optimization as 1000 uniformly. ... The hyperparameters η, β, α, γ, µ, and p are set to 0.005, 0.005, 0.01, 100, 5 and 0.1, respectively. ... The hyperparameters α, γ, µ, Q, and p are set to 0.05, 1 10 4, 0.75, 1 and 0.01, respectively. Following the setup in [2], we apply basic augmentation during training, including horizontal ﬂipping, four-pixel padding, and cropping. Models are trained from scratch for 200 epochs using a batch size of 128 and a cosine learning rate schedule.