reproducibilityindex.ai

Do Machine Learning Models Learn Statistical Rules Inferred from Data?

Authors: Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach by applying it to datasets and models from five different domains: tabular classification on a cardiovascular disease dataset (Ulianova), image classification on Image Net (Deng et al., 2009), object detection on the Cityscapes (Cordts et al., 2016) and KITTI (Geiger et al., 2013) datasets, time-series data imputation over the Physionet dataset (Silva et al., 2012), and sentiment analysis over the Financial Phrase Bank dataset (Malo et al., 2014).
Researcher Affiliation	Academia	Aaditya Naik 1 Yinjun Wu 1 Mayur Naik 1 Eric Wong 1 ... 1Department of Computer and Information Science, University of Pennsylvania, PA, USA. Correspondence to: Aaditya Naik <asnaik@seas.upenn.edu>.
Pseudocode	Yes	Algorithm 1 Computing statistics for abstract rules" and "Algorithm 2 Rule-based test-time adaptation algorithm
Open Source Code	Yes	SQRL is available at https://github.com/Debug ML/sqrl.
Open Datasets	Yes	For tabular classification, we consider the data from the Cardiovascular Disease dataset (Ulianova)... For image classification, we use a Res Net-34 model trained over the original Image Net dataset (Deng et al., 2009)... For object detection, we consider the object detector component of the Efficient PS model (Mohan & Valada, 2021). Specifically, we leverage a version of the Efficient PS model which is pretrained over the KITTI self-driving dataset (Geiger et al., 2013)3. We evaluate the model over the validation splits of the KITTI, Cityscapes (Cordts et al., 2016), and Cityscapesrain (Tremblay et al., 2020)... For time series imputation, we trained one state-of-the-art time series imputation model, SAITS (Du et al., 2023) on the Physionet Challenge 2012 dataset (Silva et al., 2012) (Physionet for short)... For semantic analysis, we use the pretrained Fin BERT model (Araci, 2019) on the Financial Phrase Bank dataset (Malo et al., 2014) (Phrase Bank for short).
Dataset Splits	Yes	We split the dataset by 65%/15%/20% for training, validation, and testing. ... We perform test-time adaptation over the validation samples of this dataset. ... We evaluate the model over the validation splits of the KITTI, Cityscapes (Cordts et al., 2016), and Cityscapesrain (Tremblay et al., 2020).
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions various models and frameworks (e.g., 'FT-Transformer', 'Res Net-34 model', 'Efficient PS model', 'SAITS', 'Fin BERT model', 'Distil Ro BERTa model', 'Roberta model'), but it does not list specific version numbers for general software dependencies such as programming languages, libraries, or operating systems (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	For the baseline methods, we follow the default setups that perform test-time adaptation for a few epochs since overfitting can occur with more epochs. We use up to 60 epochs for SQRL. We follow the default setups of test-time adaptation by only fine-tuning the statistics of the batch normalization layers rather than the entire model. ... Sample Size 4096 (for Tabular Classification) ... Sample Size 256 (for Image Classification) ... Sample Size 1 (for Object Detection and Time Series Imputation) ... Sample Size 128 (for Sentiment Analysis)