Progressive Memory Banks for Incremental Domain Adaptation

Authors: Nabiha Asghar, Lili Mou, Kira A. Selby, Kevin D. Pantasdo, Pascal Poupart, Xin Jiang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our approach achieves significantly better performance than fine-tuning alone. Compared with expanding hidden states, our approach is more robust for old domains, shown by both empirical and theoretical results. Our model also outperforms previous work of IDA including elastic weight consolidation and progressive neural networks in the experiments.
Researcher Affiliation Collaboration Vector Institute for AI, Toronto, Canada Cheriton School of Computer Science, University of Waterloo, Canada {nasghar,kaselby,kevin.pantasdo,ppoupart}@uwaterloo.ca Dept. Computing Science, University of Alberta; Alberta Machine Intelligence Institute (AMII) doublepower.mou@gmail.com Noah’s Ark Lab, Huawei Technologies, Hong Kong jiang.xin@huawei.com
Pseudocode Yes Algorithm 1: Progressive Memory for IDA
Open Source Code Yes 1Our IDA code is available at https://github.com/nabihach/IDA.
Open Datasets Yes The Multi NLI corpus (Williams et al., 2018) is particularly suitable for IDA... We use the Cornell Movie Dialogs Corpus (Danescu-Niculescu Mizil & Lee, 2011) as the source... from the Ubuntu Dialogue Corpus (Lowe et al., 2015).
Dataset Splits Yes The corpus also contains held-out (non-training) labeled data in these domains. We split it into two parts for validation and test.
Hardware Specification No The paper describes the neural network architectures (e.g., Bi-directional LSTM, GRUs) and their dimensions, but does not specify the hardware (e.g., GPU model, CPU type) used for running the experiments.
Software Dependencies No The paper mentions the 'Adam optimizer' but does not provide specific version numbers for programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other software libraries.
Experiment Setup Yes The details of network architecture, training, and hyper-parameter tuning are given in Appendix B. For the base model, we train a bi-directional LSTM (Bi LSTM)... 300D RNN hidden states, 300D pretrained GloVe embeddings (Pennington et al., 2014) for initialization, batch size of 32, and the Adam optimizer for training. The initial learning rate for Adam is tuned over the set {0.3, 0.03, 0.003, 0.0003, 0.00003}. It is set to 0.0003 based on validation performance. For the memory, we set each slot to be 300-dimensioonal... We thus choose to add 500 slots for each domain.