Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Unsupervised Detection of Contextualized Embedding Bias with Application to Ideology
Authors: Valentin Hofmann, Janet Pierrehumbert, Hinrich Schütze
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments suggest that the ideological subspace encodes abstract evaluative semantics and reflects changes in the political left-right spectrum during the presidency of Donald Trump. Table 2: Performance on link prediction (MAUC). |
| Researcher Affiliation | Academia | Valentin Hofmann 1 2 Janet B. Pierrehumbert 3 1 Hinrich Sch utze 2 1Faculty of Linguistics, University of Oxford 2Center for Information and Language Processing, LMU Munich 3Department of Engineering Science, University of Oxford. |
| Pseudocode | No | The paper describes its method using textual descriptions and mathematical equations but does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | We make our code available at https://github.com/valentinhofmann/unsupervised_bias. |
| Open Datasets | Yes | We base our study on the Reddit Politosphere (Hofmann et al., 2022b), a dataset covering the political discourse on the social media platform Reddit from 2008 to 2019. |
| Dataset Splits | Yes | We split concepts and edges for each year into train (60%), dev (20%), and test (20%). |
| Hardware Specification | Yes | Experiments are performed on a Ge Force GTX 1080 Ti GPU (11GB). |
| Software Dependencies | No | The paper mentions pretrained BERT and Adam optimizer, but does not provide specific version numbers for software dependencies like Python, PyTorch, TensorFlow, or other libraries. |
| Experiment Setup | Yes | We perform grid search for the learning rate r {1 10 4, 3 10 4, 1 10 3}. For the model used to find X , we further perform grid search for the orthogonality constant λo {1 10 3, 3 10 3, 1 10 2} as well as the sparsity constant λs {1 10 2, 3 10 2, 1 10 1}. In total, there are 3 hyperparameter search trials for X and 27 for X per year. We use Adam (Kingma & Ba, 2015) as the optimizer. Both hidden layers of the graph auto-encoder have 10 dimensions. |