Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation
Authors: Jixuan Wang, Kuan-Chieh Wang, Frank Rudzicz, Michael Brudno
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches on a collection of diverse few-shot tasks. We further conducted analysis and ablations to justify our design choices. |
| Researcher Affiliation | Academia | Jixuan Wang1,2,3 Kuan-Chieh Wang1,2 Frank Rudzicz1,2,4 Michael Brudno1,2,3 1University of Toronto, 2Vector Institute, 3University Health Network, 4Unity Health Toronto {jixuan, wangkua1, frank, brudno}@cs.toronto.edu |
| Pseudocode | Yes | Algorithm 1: Training the task embedding network and adaptation network for quick adaptation to new tasks. |
| Open Source Code | Yes | Our codes are publicly available1. 1https://github.com/jixuan-wang/Grad2Task |
| Open Datasets | Yes | Following [5], we use tasks from the GLUE benchmark [52] for training. Specifically, we use WNLI (m/mm), SST-2, QQP, RTE, MRPC, QNLI, and the SNLI dataset [10], to which we refer as our meta-training datasets. |
| Dataset Splits | Yes | The validation set of each dataset is used for hyperparameter searching and model selection. We train our model and other meta-learning models by sampling episodes from the meta-training tasks. The sampling process first selects a dataset and then randomly selects k-shot examples for each class as the support set and another k-shot as the query set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or cluster specifications) used for running experiments. |
| Software Dependencies | No | The paper mentions using the "pretrained BERTBASE model" and the "Adam" optimizer (with a reference), but does not specify version numbers for any software libraries or dependencies (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | No | The paper discusses training stages and hyperparameter searching (e.g., "validation set of each dataset is used for hyperparameter searching"), but it does not explicitly provide concrete hyperparameter values (e.g., specific learning rates, batch sizes, number of epochs) in the main body of the paper. |