Progressive Memory Banks for Incremental Domain Adaptation
Authors: Nabiha Asghar, Lili Mou, Kira A. Selby, Kevin D. Pantasdo, Pascal Poupart, Xin Jiang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our approach achieves significantly better performance than fine-tuning alone. Compared with expanding hidden states, our approach is more robust for old domains, shown by both empirical and theoretical results. Our model also outperforms previous work of IDA including elastic weight consolidation and progressive neural networks in the experiments. |
| Researcher Affiliation | Collaboration | Vector Institute for AI, Toronto, Canada Cheriton School of Computer Science, University of Waterloo, Canada {nasghar,kaselby,kevin.pantasdo,ppoupart}@uwaterloo.ca Dept. Computing Science, University of Alberta; Alberta Machine Intelligence Institute (AMII) doublepower.mou@gmail.com Noah’s Ark Lab, Huawei Technologies, Hong Kong jiang.xin@huawei.com |
| Pseudocode | Yes | Algorithm 1: Progressive Memory for IDA |
| Open Source Code | Yes | 1Our IDA code is available at https://github.com/nabihach/IDA. |
| Open Datasets | Yes | The Multi NLI corpus (Williams et al., 2018) is particularly suitable for IDA... We use the Cornell Movie Dialogs Corpus (Danescu-Niculescu Mizil & Lee, 2011) as the source... from the Ubuntu Dialogue Corpus (Lowe et al., 2015). |
| Dataset Splits | Yes | The corpus also contains held-out (non-training) labeled data in these domains. We split it into two parts for validation and test. |
| Hardware Specification | No | The paper describes the neural network architectures (e.g., Bi-directional LSTM, GRUs) and their dimensions, but does not specify the hardware (e.g., GPU model, CPU type) used for running the experiments. |
| Software Dependencies | No | The paper mentions the 'Adam optimizer' but does not provide specific version numbers for programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other software libraries. |
| Experiment Setup | Yes | The details of network architecture, training, and hyper-parameter tuning are given in Appendix B. For the base model, we train a bi-directional LSTM (Bi LSTM)... 300D RNN hidden states, 300D pretrained GloVe embeddings (Pennington et al., 2014) for initialization, batch size of 32, and the Adam optimizer for training. The initial learning rate for Adam is tuned over the set {0.3, 0.03, 0.003, 0.0003, 0.00003}. It is set to 0.0003 based on validation performance. For the memory, we set each slot to be 300-dimensioonal... We thus choose to add 500 slots for each domain. |