Unsupervised Detection of Contextualized Embedding Bias with Application to Ideology
Authors: Valentin Hofmann, Janet Pierrehumbert, Hinrich Schütze
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments suggest that the ideological subspace encodes abstract evaluative semantics and reflects changes in the political left-right spectrum during the presidency of Donald Trump. Table 2: Performance on link prediction (MAUC). |
| Researcher Affiliation | Academia | Valentin Hofmann 1 2 Janet B. Pierrehumbert 3 1 Hinrich Sch utze 2 1Faculty of Linguistics, University of Oxford 2Center for Information and Language Processing, LMU Munich 3Department of Engineering Science, University of Oxford. |
| Pseudocode | No | The paper describes its method using textual descriptions and mathematical equations but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | We make our code available at https://github.com/valentinhofmann/unsupervised_bias. |
| Open Datasets | Yes | We base our study on the Reddit Politosphere (Hofmann et al., 2022b), a dataset covering the political discourse on the social media platform Reddit from 2008 to 2019. |
| Dataset Splits | Yes | We split concepts and edges for each year into train (60%), dev (20%), and test (20%). |
| Hardware Specification | Yes | Experiments are performed on a Ge Force GTX 1080 Ti GPU (11GB). |
| Software Dependencies | No | The paper mentions pretrained BERT and Adam optimizer, but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, or other libraries. |
| Experiment Setup | Yes | We perform grid search for the learning rate r {1 10 4, 3 10 4, 1 10 3}. For the model used to find X , we further perform grid search for the orthogonality constant λo {1 10 3, 3 10 3, 1 10 2} as well as the sparsity constant λs {1 10 2, 3 10 2, 1 10 1}. In total, there are 3 hyperparameter search trials for X and 27 for X per year. We use Adam (Kingma & Ba, 2015) as the optimizer. Both hidden layers of the graph auto-encoder have 10 dimensions. |