Robust Concept Erasure via Kernelized Rate-Distortion Maximization
Authors: Somnath Basu Roy Chowdhury, Nicholas Monath, Kumar Avinava Dubey, Amr Ahmed, Snigdha Chaturvedi
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to demonstrate that KRa M is capable of erasing various types of concepts categorical, continuous, and vector-valued variables from data representations across a wide range of domains. |
| Researcher Affiliation | Collaboration | Somnath Basu Roy Chowdhury UNC Chapel Hill Nicholas Monath Google Deep Mind Avinava Dubey Google Research Amr Ahmed Google Research Snigdha Chaturvedi UNC Chapel Hill {somnath, snigdha}@cs.unc.edu {nmonath, avinavadubey, amra}@google.com |
| Pseudocode | Yes | Algorithm 1 Correlation Computation Routine |
| Open Source Code | Yes | The implementation of KRa M is publicly available at https://github.com/brcsomnath/KRa M. |
| Open Datasets | Yes | Jigsaw toxicity detection dataset [1], UCI Crimes [36], DIAL dataset [7], GloVe embeddings [46], Celeb A [37], Colored MNIST [6]. These are all standard, publicly available datasets with proper citations. |
| Dataset Splits | No | This resulted in a dataset with a train/test split of (72k, 18k) for the religion concept and (106k, 26k) for the gender concept. |
| Hardware Specification | Yes | All networks were trained using a single 22GB NVIDIA Quadro RTX 6000 GPU and experiments were executed in Py Torch [44] framework. |
| Software Dependencies | No | All networks were trained using a single 22GB NVIDIA Quadro RTX 6000 GPU and experiments were executed in Py Torch [44] framework. We set these parameters by performing a grid search on the development set using Weights & Biases [11]. We use a scikit-learn MLP classifier (non-linear) [45]. |
| Experiment Setup | Yes | In our experiments, we primarily deal with two hyperparameters: regularization constant, λ (in Equation 4), and σ, associated with the standard deviation of a Gaussian kernel (k(x, y) = e x y /σ2). |