Do Machine Learning Models Learn Statistical Rules Inferred from Data?
Authors: Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach by applying it to datasets and models from five different domains: tabular classification on a cardiovascular disease dataset (Ulianova), image classification on Image Net (Deng et al., 2009), object detection on the Cityscapes (Cordts et al., 2016) and KITTI (Geiger et al., 2013) datasets, time-series data imputation over the Physionet dataset (Silva et al., 2012), and sentiment analysis over the Financial Phrase Bank dataset (Malo et al., 2014). |
| Researcher Affiliation | Academia | Aaditya Naik 1 Yinjun Wu 1 Mayur Naik 1 Eric Wong 1 ... 1Department of Computer and Information Science, University of Pennsylvania, PA, USA. Correspondence to: Aaditya Naik <asnaik@seas.upenn.edu>. |
| Pseudocode | Yes | Algorithm 1 Computing statistics for abstract rules" and "Algorithm 2 Rule-based test-time adaptation algorithm |
| Open Source Code | Yes | SQRL is available at https://github.com/Debug ML/sqrl. |
| Open Datasets | Yes | For tabular classification, we consider the data from the Cardiovascular Disease dataset (Ulianova)... For image classification, we use a Res Net-34 model trained over the original Image Net dataset (Deng et al., 2009)... For object detection, we consider the object detector component of the Efficient PS model (Mohan & Valada, 2021). Specifically, we leverage a version of the Efficient PS model which is pretrained over the KITTI self-driving dataset (Geiger et al., 2013)3. We evaluate the model over the validation splits of the KITTI, Cityscapes (Cordts et al., 2016), and Cityscapesrain (Tremblay et al., 2020)... For time series imputation, we trained one state-of-the-art time series imputation model, SAITS (Du et al., 2023) on the Physionet Challenge 2012 dataset (Silva et al., 2012) (Physionet for short)... For semantic analysis, we use the pretrained Fin BERT model (Araci, 2019) on the Financial Phrase Bank dataset (Malo et al., 2014) (Phrase Bank for short). |
| Dataset Splits | Yes | We split the dataset by 65%/15%/20% for training, validation, and testing. ... We perform test-time adaptation over the validation samples of this dataset. ... We evaluate the model over the validation splits of the KITTI, Cityscapes (Cordts et al., 2016), and Cityscapesrain (Tremblay et al., 2020). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions various models and frameworks (e.g., 'FT-Transformer', 'Res Net-34 model', 'Efficient PS model', 'SAITS', 'Fin BERT model', 'Distil Ro BERTa model', 'Roberta model'), but it does not list specific version numbers for general software dependencies such as programming languages, libraries, or operating systems (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For the baseline methods, we follow the default setups that perform test-time adaptation for a few epochs since overfitting can occur with more epochs. We use up to 60 epochs for SQRL. We follow the default setups of test-time adaptation by only fine-tuning the statistics of the batch normalization layers rather than the entire model. ... Sample Size 4096 (for Tabular Classification) ... Sample Size 256 (for Image Classification) ... Sample Size 1 (for Object Detection and Time Series Imputation) ... Sample Size 128 (for Sentiment Analysis) |