reproducibilityindex.ai

Exploiting a Zoo of Checkpoints for Unseen Tasks

Authors: Jiaji Huang, Qiang Qiu, Kenneth Church

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments In the previous sections, we have presented two key components, estimation of κ and MMI based selection of checkpoints. In this section, we experiment with the two components combined. First, we apply algorithm 1 to the κ estimated in section 3.3, and show its effectiveness on multiple linguistic tasks. The baseline we compare against is random selection of checkpoints, and single commonly adopted checkpoint, e.g., bert-base-uncased. Then we extend to image classiﬁcation tasks. Again we observe constant improvements over random picks, and other straightforward alternative.
Researcher Affiliation	Collaboration	Jiaji Huang Baidu Research Sunnyvale, CA, 94089 huangjiaji@baidu.com Qiang Qiu School of Electrical and Computer Engineering Purdue University, West Lafayette, IN, 47907 qqiu@purdue.edu Kenneth Church Baidu Research Sunnyvale, CA, 94089 kennethchurch@baidu.com
Pseudocode	Yes	Algorithm 1 Maximum Mutual Information (MMI) based Selection of Checkpoints
Open Source Code	Yes	All results can be reproduced using code at https://github.com/baidu-research/task_space
Open Datasets	Yes	We input training set of wikitext2 as probing data, and extract the contextualized word embeddings after penultimate layer. In this section, we simulate an example using cifar100 dataset.
Dataset Splits	Yes	Each task has 480 (=500-20) training samples per class. There are 20 training samples held out for each class. ... Finally, the remaining 10 holdouts per-class are used as probing data to estimate κ. The performance of this task is measured by accuracy on the standard validation data (excluding classes not handled in this task), denoted as acct.
Hardware Specification	No	The paper does not specify any particular hardware components such as GPU or CPU models used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers, such as Python or library versions.
Experiment Setup	Yes	Following [28], we train a softmax on top of the combined word representations ( i Sfi) for each task. The gradients are not back-propagated through the checkpoints. Another design choice is that fi is taken to be the feature at top layer of the checkpoint. Each task has 480 (=500-20) training samples per class. A resnet-50 is trained for each of these seen" tasks, stored as a checkpoint.