reproducibilityindex.ai

On The Fairness Impacts of Hardware Selection in Machine Learning

Authors: Sree Harsha Nelaturu, Nishaanth Kanna Ravichandran, Cuong Tran, Sara Hooker, Ferdinando Fioretto

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through both theoretical and empirical analysis, the paper not only identifies the underlying factors but also proposes an effective strategy for mitigating hardware-induced performance imbalances.Our study stands out for its breadth, conducting experiments that cover a range of hardware architectures, datasets, and model types and the reported results highlight the critical influence of hardware on both performance and ethical dimensions of machine learning models.
Researcher Affiliation	Collaboration	1Cohere For AI Community 2Saarland University 3Dyania Health 4University of Virginia 5Cohere For AI.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	No	The paper does not include an unambiguous statement from the authors that they are releasing their code for the work described, nor does it provide a direct link to a source-code repository containing their implementation.
Open Datasets	Yes	Our experiments were conducted using three key datasets: CIFAR-10 (Krizhevsky, 2009), Celeb A (Liu et al., 2015), and UTKFace (Zhang et2017).
Dataset Splits	No	For CIFAR10, there are 60000 images (50000 train 5000 per class, 10000 test 1000 per class). The paper does not explicitly provide training/test/validation dataset splits including a validation set or clear methodology for its creation.
Hardware Specification	Yes	The experiments use a variety of GPUs: Tesla T4 (NVIDIA, 2018a), Tesla V100 (NVIDIA, 2017a), Ampere A100 (NVIDIA, 2021), and Ada L4 GPU (NVIDIA, 2023).Table 1: Comparison of the feature design and system specifications of the hardware evaluated across all experiements.
Software Dependencies	Yes	We ensure determinism by fixing the random seed for all python libraries, including Py Torch (Paszke et al., 2019) 2.0, ensuring consistency.
Experiment Setup	Yes	For all experiments We used SGD+momentum 0.99, with weight decay of 5e 4, a three-phase one-cycle LR (Leslie, 2015) scheduler with a starting learning rate of 0.1. The batch size for CIFAR10, UTKFace and Celeb A is set to 512, 128 (32 for Res Net50) and 200, respectively. Models trained on CIFAR10 and Celeb A were trained for 15 epochs, and those trained on UTKFace for 20 epochs.