reproducibilityindex.ai

Progressive Memory Banks for Incremental Domain Adaptation

Authors: Nabiha Asghar, Lili Mou, Kira A. Selby, Kevin D. Pantasdo, Pascal Poupart, Xin Jiang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our approach achieves significantly better performance than fine-tuning alone. Compared with expanding hidden states, our approach is more robust for old domains, shown by both empirical and theoretical results. Our model also outperforms previous work of IDA including elastic weight consolidation and progressive neural networks in the experiments.
Researcher Affiliation	Collaboration	Vector Institute for AI, Toronto, Canada Cheriton School of Computer Science, University of Waterloo, Canada {nasghar,kaselby,kevin.pantasdo,ppoupart}@uwaterloo.ca Dept. Computing Science, University of Alberta; Alberta Machine Intelligence Institute (AMII) doublepower.mou@gmail.com Noah’s Ark Lab, Huawei Technologies, Hong Kong jiang.xin@huawei.com
Pseudocode	Yes	Algorithm 1: Progressive Memory for IDA
Open Source Code	Yes	1Our IDA code is available at https://github.com/nabihach/IDA.
Open Datasets	Yes	The Multi NLI corpus (Williams et al., 2018) is particularly suitable for IDA... We use the Cornell Movie Dialogs Corpus (Danescu-Niculescu Mizil & Lee, 2011) as the source... from the Ubuntu Dialogue Corpus (Lowe et al., 2015).
Dataset Splits	Yes	The corpus also contains held-out (non-training) labeled data in these domains. We split it into two parts for validation and test.
Hardware Specification	No	The paper describes the neural network architectures (e.g., Bi-directional LSTM, GRUs) and their dimensions, but does not specify the hardware (e.g., GPU model, CPU type) used for running the experiments.
Software Dependencies	No	The paper mentions the 'Adam optimizer' but does not provide specific version numbers for programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other software libraries.
Experiment Setup	Yes	The details of network architecture, training, and hyper-parameter tuning are given in Appendix B. For the base model, we train a bi-directional LSTM (Bi LSTM)... 300D RNN hidden states, 300D pretrained GloVe embeddings (Pennington et al., 2014) for initialization, batch size of 32, and the Adam optimizer for training. The initial learning rate for Adam is tuned over the set {0.3, 0.03, 0.003, 0.0003, 0.00003}. It is set to 0.0003 based on validation performance. For the memory, we set each slot to be 300-dimensioonal... We thus choose to add 500 slots for each domain.