reproducibilityindex.ai

Auto-Transfer: Learning to Route Transferable Representations

Authors: Keerthiram Murugesan, Vijay Sadashivaiah, Ronny Luss, Karthikeyan Shanmugam, Pin-Yu Chen, Amit Dhurandhar

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present experimental results to validate our Auto-Transfer methods. We ﬁrst show the improvements in model accuracy that can be achieved over various baselines on six different datasets (section A.3) and two network/task setups.
Researcher Affiliation	Collaboration	1IBM Research, Yorktown Heights 2Rensselaer Polytechnic Institute, New york
Pseudocode	Yes	Algorithm 1 AMAB Update Algorithm for Target Layer ℓ; Algorithm 2 TRAIN-TARGET Train Target Network; Algorithm 3 EVALUATE Evaluate Target Network
Open Source Code	Yes	1Code available at https://github.com/IBM/auto-transfer
Open Datasets	Yes	We apply our method to four target tasks: Caltech-UCSD Bird 200 (Wah et al., 2011), MIT Indoor Scene Recognition (Quattoni & Torralba, 2009), Stanford 40 Actions (Yao et al., 2011) and Stanford Dogs (Khosla et al., 2011). For Tiny Image Net based transfer, we apply our method on two target tasks: CIFAR100 (Krizhevsky et al., 2009), STL-10 (Coates et al., 2011).
Dataset Splits	No	The bandit algorithm intervenes once every epoch of training to make choices using rewards from evaluation of the combined network on a hold out set, while the latest choice made by the bandit is used by the training algorithm to update the target network parameters on the target task. (...) Reward function: The reward rt for the selected routing choice is then computed by evaluating gain in the loss due to the chosen source-target combination as follows: the prediction gain is the difference between the target network s losses on a hold out set Dv with and without the routing choice at i.e., L(f M T (x)) L( f M T (x)) for a given image x from the hold out data.
Hardware Specification	Yes	The target models were trained in parallel on two machines with the speciﬁcations shown in Table 2. Resource Setting CPU Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz Memory 128GB GPUs 1 x NVIDIA Tesla V100 16 GB Disk 600GB OS Ubuntu 18.04-64 Minimal for VSI.
Software Dependencies	No	The paper mentions optimizers (SGD, ADAM) and a learning rate scheduler (Cosine Annealing) but does not provide specific version numbers for software libraries or frameworks like Python, PyTorch, TensorFlow, or CUDA, which are necessary for full reproducibility.
Experiment Setup	Yes	For our experimental analysis in the main paper, we set the number of epochs for training to E = 200. The learning rate for SGD is set to 0.1 with momentum 0.9 and weight decay 0.001. The learning rate for the ADAM is set to 0.001 with and weight decay of 0.001. We use Cosine Annealing learning rate scheduler for both optimizers. The batch size for training is set to 64.