Probing Classifiers are Unreliable for Concept Removal and Detection
Authors: Abhinav Kumar, Chenhao Tan, Amit Sharma
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on four datasets natural language inference, sentiment analysis, tweet-mention detection, and a synthetic task confirm our claims. |
| Researcher Affiliation | Collaboration | Abhinav Kumar Microsoft Research t-abkumar@microsoft.com Chenhao Tan University of Chicago chenhao@uchicago.edu Amit Sharma Microsoft Research amshar@microsoft.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states in its checklist that code is included (See E) but Appendix E does not provide a specific URL or explicit statement about the release of source code for the paper's methodology. |
| Open Datasets | Yes | We use three datasets: Multi NLI [46], Twitter-PAN16 [31] and Twitter-AAE [6]. |
| Dataset Splits | Yes | For Multi NLI, we use standard validation/test splits provided in the dataset. |
| Hardware Specification | No | The paper states 'All experiments run on a single NVIDIA GPU' but does not provide specific hardware details such as the GPU model, CPU type, or memory specifications. |
| Software Dependencies | No | The paper mentions using RoBERTa, GloVe embeddings, and the AdamW optimizer, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We train for 20 epochs for all datasets, except for Synthetic-Text and Multi NLI, for which we train for 40 epochs. We use AdamW optimizer with a learning rate of 1e-5. We use a batch size of 32 for Multi NLI, 16 for Twitter-PAN16, 8 for Twitter-AAE, and 64 for Synthetic-Text. |