Zero-Shot Task Adaptation with Relevant Feature Information

Authors: Atsutoshi Kumagai, Tomoharu Iwata, Yasuhiro Fujiwara

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that the proposed method outperforms existing methods with four real-world datasets.
Researcher Affiliation Industry Atsutoshi Kumagai1, Tomoharu Iwata2, Yasuhiro Fujiwara2 1 NTT Computer and Data Science Laboratories 2 NTT Communication Science Laboratories atsutoshi.kumagai@ntt.com, tomoharu.iwata@ntt.com, yasuhiro.fujiwara@ntt.com
Pseudocode Yes Algorithm 1: Meta-training procedure of our model.
Open Source Code No The paper does not include any explicit statement about releasing source code for the described methodology or provide a link to a code repository.
Open Datasets Yes We used four real-world datasets: 20News3, Wo S4, URL5 and Mnistr6. [Footnotes provide URLs to public datasets: 3http://qwone.com/~jason/20Newsgroups/, 4https://github.com/kk7nc/HDLTex, 5https://www.kaggle.com/datasets/shawon10/url-classification-dataset-dmoz, 6https://github.com/ghif/mtae]
Dataset Splits Yes For 20News, we randomly used 10 classes for training, 5 classes for validation, and 5 classes for testing. For Wo S, we randomly used 69 classes for training, 5 classes for validation, and 5 classes for testing. For URL, we randomly used 7 classes for training, 4 classes for validation, and 4 classes for testing. For Mnistr, we randomly used 30 classes for training, 15 classes for validation, and 15 classes for testing.
Hardware Specification Yes All experiments were conducted on a Linux server with A100 GPU and 2.20Hz Intel Xeon CPU.
Software Dependencies No All neural network-based methods were implemented using Pytorch (Paszke et al. 2017). This mention of Pytorch does not include a specific version number required for reproducibility.
Experiment Setup Yes For the proposed method, LR, NN, and LRD2, the number of synthetic unlabeled data N was 50. For the proposed method, LRD2, LRD2U, and NPU, the step size of gradient descent α and the iteration number I in the inner problems were selected from {10, 1, 10 1} and {2, 5, 10}, respectively. For the proposed method, regularization parameters λ and µ were selected from {1, 10 1, 10 2, 0} and {10, 1, 10 1, 10 2, 10 3}, respectively. ... For all neural network-based methods, we used the Adam optimizer with a learning rate of 10 3 (Kingma and Ba 2014).