Robust Concept Erasure via Kernelized Rate-Distortion Maximization

Authors: Somnath Basu Roy Chowdhury, Nicholas Monath, Kumar Avinava Dubey, Amr Ahmed, Snigdha Chaturvedi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to demonstrate that KRa M is capable of erasing various types of concepts categorical, continuous, and vector-valued variables from data representations across a wide range of domains.
Researcher Affiliation Collaboration Somnath Basu Roy Chowdhury UNC Chapel Hill Nicholas Monath Google Deep Mind Avinava Dubey Google Research Amr Ahmed Google Research Snigdha Chaturvedi UNC Chapel Hill {somnath, snigdha}@cs.unc.edu {nmonath, avinavadubey, amra}@google.com
Pseudocode Yes Algorithm 1 Correlation Computation Routine
Open Source Code Yes The implementation of KRa M is publicly available at https://github.com/brcsomnath/KRa M.
Open Datasets Yes Jigsaw toxicity detection dataset [1], UCI Crimes [36], DIAL dataset [7], GloVe embeddings [46], Celeb A [37], Colored MNIST [6]. These are all standard, publicly available datasets with proper citations.
Dataset Splits No This resulted in a dataset with a train/test split of (72k, 18k) for the religion concept and (106k, 26k) for the gender concept.
Hardware Specification Yes All networks were trained using a single 22GB NVIDIA Quadro RTX 6000 GPU and experiments were executed in Py Torch [44] framework.
Software Dependencies No All networks were trained using a single 22GB NVIDIA Quadro RTX 6000 GPU and experiments were executed in Py Torch [44] framework. We set these parameters by performing a grid search on the development set using Weights & Biases [11]. We use a scikit-learn MLP classifier (non-linear) [45].
Experiment Setup Yes In our experiments, we primarily deal with two hyperparameters: regularization constant, λ (in Equation 4), and σ, associated with the standard deviation of a Gaussian kernel (k(x, y) = e x y /σ2).