Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Obliviator Reveals the Cost of Nonlinear Guardedness in Concept Erasure

Authors: Ramin Akbari, Milad Afshari, Vishnu Boddeti

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through these trade-off curves, across various experiments, we demonstrate Obliviator outperforms baselines while guarding against nonlinear adversaries (see Figure 1). Moreover, its erasure becomes more utility-preserving when applied to the better-disentangled representations learned by more capable PLMs.
Researcher Affiliation Academia Michigan State University EMAIL
Pseudocode Yes Algorithm 1 Obliviator Training Procedure
Open Source Code No The code is not currently open-source but will be made public when the paper is published.
Open Datasets Yes We use three datasets in our experiments: BIAS IN BIOS [8] (Y: Profession, S: Gender), DIAL-SENTIMENT [4] (Y: Sentiment, S: Race), and DIAL-MENTION [4] (Y: Mention, S: Race).
Dataset Splits Yes To ensure a fair comparison with Fa RM, we used the same dataset split as Fa RM [6] for the fine-tuned BERT representations. For the frozen representations, we followed the dataset split used by [26].
Hardware Specification Yes Training is conducted on a single NVIDIA RTX A6000 GPU.
Software Dependencies No SVM with an RBF kernel (implemented via cu ML [25]) and an MLP with two hidden layers of 128 neurons (implemented in Py Torch [21]).
Experiment Setup Yes We use a multilayer perceptron (MLP) as our encoder, consisting of a single hidden layer with 256 units and the Si LU activation function. Optimization is performed using the Adam W optimizer with default hyperparameters. We set the learning rate to 5 10 4 and apply a weight decay of 0.001. For the BIAS IN BIOS dataset, the encoder is trained for 30 iterations in the first step and 25 iterations in subsequent steps. ... The hyper-parameters used for training can be found in Table 2.