Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders

Authors: Borna Khodabandeh, Amirabbas Afzali, Amirhossein Afsharrad, Shahab Mousavi, Sanjay Lall, Sajjad Amini, Seyed-Mohsen Moosavi-Dezfooli

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that LORE stabilizes training and significantly improves zero-shot adversarial robustness with minimal degradation in clean data accuracy. Furthermore, we demonstrate the effectiveness of the adversarially fine-tuned image encoder in out-of-distribution generalization and enhancing the interpretability of image embeddings.
Researcher Affiliation Collaboration 1Stanford University 2Aktus AI 3University of Massachusetts Amherst 4Apple
Pseudocode Yes Algorithm 1 Lagrangian-Optimized Robust Embeddings (LORE) for each epoch do for batch x D do δ ATTACKALG d(ϕθt(x+δ), ϕθ0(x)) for i = 1 to K do Lrobust d(ϕθt(x + δ), ϕθ0(x)) Lclean d(ϕθt(x), ϕθ0(x)) ρ m(x) Ltotal Lrobust + λω(x) Lclean θ θ ηθ θLtotal end for ω ω + ηω Lclean ωλω(x) end for end for return θ, ω
Open Source Code Yes The code is available on Git Hub.
Open Datasets Yes We use both Image Net and Image Net-100, a curated subset of Image Net, as training datasets throughout our experiments. [...] We evaluate adversarial robustness of the Vi T-B/32 CLIP vision encoder on 13 zero-shot benchmarks from CLIP-benchmark3, all originally trained on Image Net.
Dataset Splits Yes We use both Image Net and Image Net-100, a curated subset of Image Net, as training datasets throughout our experiments. Additional details for each experiment, including figures and tables, are provided in Appendix C. [...] We evaluate adversarial robustness of the Vi T-B/32 CLIP vision encoder on 13 zero-shot benchmarks from CLIP-benchmark3, all originally trained on Image Net.
Hardware Specification Yes Experiments were conducted using 8 NVIDIA HGX H100 80GB GPUs.
Software Dependencies No Unless otherwise noted, all models were trained using Adam W with a weight decay of 1 10 4, a cosine learning rate scheduler, and adversarial training with PGD (10 iterations, step size ε/4) under an ℓ constraint.
Experiment Setup Yes Training hyperparameters. We report below the training settings used across all experiments. Unless otherwise noted, all models were trained using Adam W with a weight decay of 1 10 4, a cosine learning rate scheduler, and adversarial training with PGD (10 iterations, step size ε/4) under an ℓ constraint. Each λ network used the 2-layer linear_mlp architecture, with a hidden dimension of 512, and was optimized via K = 5 inner primal updates with learning rate 5 10 4. More experimental details are provided in Table 6.