Labeled Memory Networks for Online Model Adaptation

Authors: Shiv Shankar, Sunita Sarawagi

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate online model adaptation strategies on five sequence prediction tasks, an image classification task, and two language modeling tasks. We show that LMNs are better than other MANNs designed for meta-learning. We also found them to be more accurate and faster than state-of-theart methods of retuning model parameters for adapting to domain-specific labeled data.
Researcher Affiliation Academia Shiv Shankar shiv shankar@iitb.ac.in IIT BombaySunita Sarawagi sunita@iitb.ac.in IIT Bombay
Pseudocode Yes The overall algorithm is depicted in Figure 2.
Open Source Code Yes code to be available on https://github.com/sshivs/LMN
Open Datasets Yes FSQNYC and FSQTokyo are Location Based Social Network data collected by (Yang et al. 2015) from Four Square of user check-in at various venues over an year. Brightkite (Cho, Myers, and Leskovec 2011) is a user check-in dataset made available as part of Stanford Network Analysis Project (Leskovec and Krevl 2014). Geolife (Zheng et al. 2009) is the trajectory data of people collected over multiple days... The Yoochoose dataset (Ben-Shimon et al. 2015) is the click event sessions... We use the popular omniglot dataset (Lake, Salakhutdinov, and Tenenbaum 2015). We compared on common language datasets Wikitext2 and Text8 with memory sizes 100 and 2000 as used in the previously published work.
Dataset Splits Yes In Table 1 we summarize the average length of each sequence, number of tokens, and the number of sequences in the training and test set.
Hardware Specification Yes We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.
Software Dependencies No The paper mentions software components like 'Adam optimizer' and 'GRU' but does not provide specific version numbers for these or any other libraries or frameworks used.
Experiment Setup Yes In all experiments we used the Adam optimizer (Kingma and Ba 2014). The PCN is a GRU and the input is the embedding of the true observed token yt 1 at the previous time. In our experiments we used a decay value of 0.99. The margin is a hyper-parameter.