reproducibilityindex.ai

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Authors: Jixuan Wang, Kuan-Chieh Wang, Frank Rudzicz, Michael Brudno

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our approach outperforms traditional ﬁne-tuning, sequential transfer learning, and state-of-the-art meta learning approaches on a collection of diverse few-shot tasks. We further conducted analysis and ablations to justify our design choices.
Researcher Affiliation	Academia	Jixuan Wang1,2,3 Kuan-Chieh Wang1,2 Frank Rudzicz1,2,4 Michael Brudno1,2,3 1University of Toronto, 2Vector Institute, 3University Health Network, 4Unity Health Toronto {jixuan, wangkua1, frank, brudno}@cs.toronto.edu
Pseudocode	Yes	Algorithm 1: Training the task embedding network and adaptation network for quick adaptation to new tasks.
Open Source Code	Yes	Our codes are publicly available1. 1https://github.com/jixuan-wang/Grad2Task
Open Datasets	Yes	Following [5], we use tasks from the GLUE benchmark [52] for training. Speciﬁcally, we use WNLI (m/mm), SST-2, QQP, RTE, MRPC, QNLI, and the SNLI dataset [10], to which we refer as our meta-training datasets.
Dataset Splits	Yes	The validation set of each dataset is used for hyperparameter searching and model selection. We train our model and other meta-learning models by sampling episodes from the meta-training tasks. The sampling process ﬁrst selects a dataset and then randomly selects k-shot examples for each class as the support set and another k-shot as the query set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or cluster specifications) used for running experiments.
Software Dependencies	No	The paper mentions using the "pretrained BERTBASE model" and the "Adam" optimizer (with a reference), but does not specify version numbers for any software libraries or dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	No	The paper discusses training stages and hyperparameter searching (e.g., "validation set of each dataset is used for hyperparameter searching"), but it does not explicitly provide concrete hyperparameter values (e.g., specific learning rates, batch sizes, number of epochs) in the main body of the paper.