Algorithmic Collective Action in Machine Learning
Authors: Moritz Hardt, Eric Mazumdar, Celestine Mendler-Dünner, Tijana Zrnic
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Complementing our theory, we conduct systematic experiments on a skill classification task involving tens of thousands of resumes from a gig platform for freelancers. Through more than two thousand model training runs of a BERT-like language model, we see a striking correspondence emerge between our empirical observations and the predictions made by our theory. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Intelligent Systems, T ubingen, and T ubingen AI Center 2Caltech 3UC Berkeley. |
| Pseudocode | No | The paper describes various strategies (e.g., Feature label signal strategy, Erasure strategy) in text and mathematical formulas, but it does not present them in pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the code for its methodology. |
| Open Datasets | Yes | The resume dataset consists of nearly 30000 resumes of freelancers on a major gig labor platform, introduced by (Jiechieu & Tsopze, 2021). [...] The dataset introduced by Jiechieu & Tsopze (2021) is available at https://github.com/florex/resume_ corpus. [...] The dataset contains 29783 resumes, of which we use 25000 for training and 4783 for testing. |
| Dataset Splits | No | The paper specifies using 25000 resumes for training and 4783 for testing, but it does not mention a separate validation set. |
| Hardware Specification | No | The paper mentions training a BERT-like language model and performing over two thousand model training runs, but it does not specify any hardware details like GPU/CPU models, memory, or cloud computing resources. |
| Software Dependencies | No | We used the Hugging Face transformers open-source Python library (Wolf et al., 2020). We used the distilbert-base-uncased model from the library corresponding to the Distil BERT transformer model (Sanh et al., 2019). We used the Hugging Face Trainer module for training with its default settings. |
| Experiment Setup | Yes | We fine-tune it on the dataset for 5 epochs with standard hyperparameters. [...] We picked token 1240 corresponding to a small dash, which we inserted in the resume every 20 words. |