Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Algorithmic Collective Action in Machine Learning
Authors: Moritz Hardt, Eric Mazumdar, Celestine Mendler-Dรผnner, Tijana Zrnic
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Complementing our theory, we conduct systematic experiments on a skill classification task involving tens of thousands of resumes from a gig platform for freelancers. Through more than two thousand model training runs of a BERT-like language model, we see a striking correspondence emerge between our empirical observations and the predictions made by our theory. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Intelligent Systems, T ubingen, and T ubingen AI Center 2Caltech 3UC Berkeley. |
| Pseudocode | No | The paper describes various strategies (e.g., Feature label signal strategy, Erasure strategy) in text and mathematical formulas, but it does not present them in pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the code for its methodology. |
| Open Datasets | Yes | The resume dataset consists of nearly 30000 resumes of freelancers on a major gig labor platform, introduced by (Jiechieu & Tsopze, 2021). [...] The dataset introduced by Jiechieu & Tsopze (2021) is available at https://github.com/florex/resume_ corpus. [...] The dataset contains 29783 resumes, of which we use 25000 for training and 4783 for testing. |
| Dataset Splits | No | The paper specifies using 25000 resumes for training and 4783 for testing, but it does not mention a separate validation set. |
| Hardware Specification | No | The paper mentions training a BERT-like language model and performing over two thousand model training runs, but it does not specify any hardware details like GPU/CPU models, memory, or cloud computing resources. |
| Software Dependencies | No | We used the Hugging Face transformers open-source Python library (Wolf et al., 2020). We used the distilbert-base-uncased model from the library corresponding to the Distil BERT transformer model (Sanh et al., 2019). We used the Hugging Face Trainer module for training with its default settings. |
| Experiment Setup | Yes | We fine-tune it on the dataset for 5 epochs with standard hyperparameters. [...] We picked token 1240 corresponding to a small dash, which we inserted in the resume every 20 words. |