reproducibilityindex.ai

Fair and Accurate Decision Making through Group-Aware Learning

Authors: Ramtin Hosseini, Li Zhang, Bhanu Garg, Pengtao Xie

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform experiments on Celeb A, ISIC-18, CIFAR10, CIFAR-100, and Image Net datasets to showcase the effectiveness of our proposed method in both fairness and accuracy aspects. Additionally, we apply our proposed LBG to language understanding tasks by conducting experiments on GLUE datasets, which can be found in the Supplements.
Researcher Affiliation	Academia	1Department of Electrical and Computer Engineering, UCSD, San Diego, USA.
Pseudocode	Yes	This algorithm is summarized in Algorithm 1. Algorithm 1 Optimization algorithm for Learning by Grouping
Open Source Code	Yes	The LBG implementation can be found in the Skillearn repository at here.
Open Datasets	Yes	Various experiments are conducted on four datasets: ISIC18, Celeb A, CIFAR-10, CIFAR-100, and Image Net (Deng et al., 2009) for image classification. The Celeb A dataset, consisting of 200k images of human faces with 40 features per image (Liu et al., 2015), is used in this study. The Skin ISIC 2018 dataset (Codella et al., 2019; Tschandl et al., 2018) consists of a total of 11,720 dermatological images... The CIFAR-10 dataset contains of 10 distinct classes, while the CIFAR-100 dataset encompasses 100 classes. Image Net carries 1.2M training images and 50K test images with 1000 classes. We conducted experiments on the various tasks of the General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2018).
Dataset Splits	Yes	From the dataset, we select a sample of 10,000 images, with 70% allocated for training, 15% for validation, and 15% for testing. The training set comprises 10,015 images, the validation set contains 1,512 images, and the testing set consists of 193 images, collectively representing the entirety of the dataset. For each of the datasets, during grouping and architecture search processes, we use 25K images as the training set, 25K images as the validation set, and the rest of the 10K images as the test set. we randomly choose 10%, and 2.5% of the 1.2M images to create a new training set and validation set, respectively, for the architecture search phase.
Hardware Specification	Yes	We train the networks with a batch size of 96 and 600 epochs on a single Tesla V100 GPU. ...trained using four Tesla V100 GPUs
Software Dependencies	No	The paper mentions using the 'Betty library (Choe et al., 2022)' for implementation and 'Adam optimizer (Paszke et al., 2017)' for optimization, but it does not specify version numbers for these or any other software dependencies.
Experiment Setup	Yes	The initial learning rate is set to 0.1 with momentum 0.9 and will be reduced using a cosine decay scheduler with the weight decay of 3e-4. The batch size for CIFAR-10 and CIFAR-100 is set to 128, while for Image Net we use the batch size of 1024. In this study, the Adam optimizer has been employed to train all models on the Celeb A dataset, utilizing a learning rate of 5e-4, and implementing a batch size of 64. For the ISIC-18 experiments, we have set the learning rate to 1e-3, and the batch size to 32. The search algorithm was based on SGD with a batch size of 64, the initial learning rate of 0.025 (reduced in later epochs using a cosine decay scheduler), epoch number of 50, weight decay of 3e-4, and momentum of 0.9.