Probing Classifiers are Unreliable for Concept Removal and Detection

Authors: Abhinav Kumar, Chenhao Tan, Amit Sharma

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results on four datasets natural language inference, sentiment analysis, tweet-mention detection, and a synthetic task confirm our claims.
Researcher Affiliation Collaboration Abhinav Kumar Microsoft Research t-abkumar@microsoft.com Chenhao Tan University of Chicago chenhao@uchicago.edu Amit Sharma Microsoft Research amshar@microsoft.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper states in its checklist that code is included (See E) but Appendix E does not provide a specific URL or explicit statement about the release of source code for the paper's methodology.
Open Datasets Yes We use three datasets: Multi NLI [46], Twitter-PAN16 [31] and Twitter-AAE [6].
Dataset Splits Yes For Multi NLI, we use standard validation/test splits provided in the dataset.
Hardware Specification No The paper states 'All experiments run on a single NVIDIA GPU' but does not provide specific hardware details such as the GPU model, CPU type, or memory specifications.
Software Dependencies No The paper mentions using RoBERTa, GloVe embeddings, and the AdamW optimizer, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We train for 20 epochs for all datasets, except for Synthetic-Text and Multi NLI, for which we train for 40 epochs. We use AdamW optimizer with a learning rate of 1e-5. We use a batch size of 32 for Multi NLI, 16 for Twitter-PAN16, 8 for Twitter-AAE, and 64 for Synthetic-Text.