reproducibilityindex.ai

Connect Later: Improving Fine-tuning for Robustness with Targeted Augmentations

Authors: Helen Qu, Sang Michael Xie

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our framework on 4 real-world datasets: wildlife identification (IWILDCAM-WILDS, Beery et al., 2020; Sagawa et al., 2022), tumor detection (CAMELYON17-WILDS, Bandi et al., 2018; Sagawa et al., 2022) and 2 astronomical time series tasks, ASTROCLASSIFICATION and REDSHIFTS, which we curate from The PLAs Ti CC team et al. (2018). In Section 5, we show that Connect Later improves OOD performance over standard finetuning or supervised learning with targeted augmentations across all datasets.
Researcher Affiliation	Academia	Helen Qu 1 Sang Michael Xie 2 1Department of Physics and Astronomy, University of Pennsylvania 2Department of Computer Science, Stanford University. Correspondence to: Helen Qu <helenqu@sas.upenn.edu>.
Pseudocode	No	The paper describes methods using prose and mathematical equations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets	Yes	We evaluate our framework on 4 real-world datasets: wildlife identification (IWILDCAM-WILDS, Beery et al., 2020; Sagawa et al., 2022), tumor detection (CAMELYON17-WILDS, Bandi et al., 2018; Sagawa et al., 2022) and 2 astronomical time series tasks, ASTROCLASSIFICATION and REDSHIFTS, which we curate from The PLAs Ti CC team et al. (2018). [Footnote 2] https://zenodo.org/record/2539456
Dataset Splits	No	For IWILDCAM-WILDS, we use a Res Net-50 model pretrained on unlabeled Image Net data with Sw AV contrastive learning (Caron et al., 2020). ... train all models for 15 epochs with early stopping on OOD validation performance... However, no explicit details on how the validation set was created (e.g., its size or split percentage) for the main experiments across all datasets. The "80/10/10 train/validation/test split" mentioned in Appendix D refers to an auxiliary experiment for connectivity measures, not the main training splits.
Hardware Specification	No	The paper does not specify the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions software components like 'Informer model' and 'Adam' optimizer but does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch, CUDA) used in their experimental setup.
Experiment Setup	Yes	We perform pretraining with a batch size of 256 and learning rate 1e-4 (selected from 1e-3 1e-6) for 75,000 steps. We finetune the pretrained model with linear probing for 20,000 steps (for pretrained models only) and learning rate 1e-4, then fine-tuning for 10,000 steps at learning rate of 4e-5. ... For IWILDCAM-WILDS, we train all models for 15 epochs with early stopping... We sample the following hyperparameters independently from the following distributions: the linear probe learning rate (10Uniform[ 3, 2]), fine-tuning learning rate (10Uniform[ 5, 2]), and probability of applying the augmentation (Uniform[0.5, 0.9])