Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Robust Satisficing Gaussian Process Bandits Under Adversarial Attacks

Authors: Artun Saday, Yaşar Cahit Yıldırım, Cem Tekin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we demonstrate that our approach outperforms the established robust optimization methods in achieving the satisficing objective, particularly when the ambiguity set of the robust optimization framework is inaccurately specified. ... Finally, we present experiments, including in-silico experiments on an FDA-approved simulator for diabetes management with realistic adversarial perturbations, to demonstrate the strengths of our algorithms addressing the shortcomings of other approaches known in the literature.
Researcher Affiliation	Academia	Artun Saday Bilkent University, Ankara, Türkiye EMAIL Ya sar Cahit Yıldırım Bilkent University, Ankara, Türkiye EMAIL Cem Tekin Bilkent University, Ankara, Türkiye EMAIL
Pseudocode	Yes	Algorithm 1 Adve RS-1 1: Input: Kernel function k, X, τ, R, B, confidence parameter ζ, time horizon T 2: Initialize: Set D0 = (empty dataset), µ0(x) = 0, σ0(x) = 1 for all x 3: for t = 1 to T do 4: Compute ucbt(x) = µt 1(x) + βtσt 1(x), x X 5: Compute κτ,t(x) as in (7), x X 6: Select point xt = arg minx X κτ,t(x) 7: Adversary selects perturbation δt ϵt( xt) 8: Sample xt = xt + δt, observe yt = f(xt) + ηt 9: Update dataset Dt = Dt 1 {(xt, yt)} 10: Update GP posterior as in (5) 11: end for
Open Source Code	Yes	Our code is available at: https: //github.com/Bilkent-CYBORG/Adve RS.
Open Datasets	No	We apply this to an insulin dosage selection problem for Type 1 Diabetes Mellitus (T1DM) patients using the UVA/PADOVA T1DM simulator [40]. ... In the first experiment, we use the proof-of-concept function in Figure 2, a modified version of the synthetic function from [22].
Dataset Splits	No	In the first experiment, we use the proof-of-concept function in Figure 2, a modified version of the synthetic function from [22]. We set the threshold τ = 10 and conduct two experiments: (a) ϵt = 0.5 (Assumption 3.8 holds) and (b) ϵt = 1.5 (Assumption 3.8 fails to hold). ... We discretize the parameter space into 4096 points and compute the cost for each parameter by simulating it over 2000 seconds, using the same cost function as in [44]. ... The results of all experiments are averaged over 100 runs, with error bars representing std/2.
Hardware Specification	No	We used a modern computer with no special hardware for the experiments, as they do not require intensive computations. Therefore we did not see it necessary to disclose our compute sources.
Software Dependencies	No	The observation noise follows ηt N(0, 1), and the GP kernel is a polynomial kernel trained on 500 samples from the function. ... We implement a static feedback controller similar to the one in [43], defined as uk = F sk, where F R1 4 is the gain matrix. ... For the GP kernel, we use a Radial Basis Function (RBF) kernel with Automatic Relevance Determination (ARD), with hyperparameters selected using 400 samples prior to the experiment.
Experiment Setup	Yes	In the first experiment, we use the proof-of-concept function in Figure 2, a modified version of the synthetic function from [22]. We set the threshold τ = 10 and conduct two experiments: (a) ϵt = 0.5 (Assumption 3.8 holds) and (b) ϵt = 1.5 (Assumption 3.8 fails to hold). For the RO representative, we run STABLEOPT [22] with radius parameters r = ϵt, r = 4ϵt, and r = 0.5ϵt. The observation noise follows ηt N(0, 1), and the GP kernel is a polynomial kernel trained on 500 samples from the function. ... The action space is units of insulin in the range [0, 15], and the context is carbohydrate intake, perturbed by δt N(0, σ2 s) with σs = 32 ... The parameterization... is given by F = dlqr(A, B, W s(θ), W u(θ)) with the following parameterization: W s(θ) = diag(10θ1, 10θ2, 10θ3, 0.1), θ1,2,3 [ 3, 2], W u(θ) = 10 θ4, θ4 [1, 5] . We discretize the parameter space into 4096 points and compute the cost for each parameter by simulating it over 2000 seconds, using the same cost function as in [44]. After the simulation, the cost function is standardized and clipped to the range [ 2, 2]. For the GP kernel, we use a Radial Basis Function (RBF) kernel with Automatic Relevance Determination (ARD), with hyperparameters selected using 400 samples prior to the experiment.