Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Achieving Domain-Independent Certified Robustness via Knowledge Continuity
Authors: Alan Sun, Chiyu Ma, Kenneth Ge, Soroush Vosoughi
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, to complement our theoretical results, we present several applications of knowledge continuity such as regularization, a certification algorithm, and show that knowledge continuity can be used to localize vulnerable components of a neural network. Unless otherwise specified, we run all of our experiments on the IMDB dataset [48] (a sentiment classification task) using a host of language models from different model families (encoder, decoder, encoder-decoder). We also present additional experiments on vision tasks. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University, 2Dartmouth College |
| Pseudocode | Yes | Algorithm 1 A Monte-Carlo algorithm for estimating π-volatility of some metric decomposable function πwith πhidden layers (left). Augmenting any loss function to regularize π-volatility (right), given some Beta distribution parameterized by πΌ, π½and regularization strength π 0. |
| Open Source Code | Yes | Codebase for our experiments can be found at https://github.com/alansun17904/kc. The rest of our codebase including implementations of the algorithms and figures described in the manuscript can be found at https://github.com/alansun17904/kc. |
| Open Datasets | Yes | Unless otherwise specified, we run all of our experiments on the IMDB dataset [48] (a sentiment classification task). The IMDB dataset consist of 50,000 examples with 25,000 for training and 25,000 for testing. |
| Dataset Splits | Yes | We split the test set 40%-60% to create a validation and test set of 10,000 and 15,000 examples, respectively. |
| Hardware Specification | Yes | All of our experiments were conducted on four NVIDIA RTX A6000 GPUs as well as four NVIDIA Quadro RTX 6000 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for software libraries or dependencies, only general mentions of tools or methods. |
| Experiment Setup | Yes | We train all models using the hyperparameter and optimizer configurations shown in Table 4. Hyperparameter Value Optimizer Adam Adam π½1 0.9 Adam π½2 0.999 Adam π 1 10 8 Max Gradient Norm 1.0 Learning Rate Scheduler Linear Epochs 20 Batch Size 32 Learning Rate 5 10 5 Weight Decay 1 10 9 |