A Combinatorial Perspective on Transfer Learning

Authors: Jianan Wang, Eren Sezener, David Budden, Marcus Hutter, Joel Veness

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now explore the properties of the NCTL algorithm empirically. We present our analysis in three parts: in Section 5.1, we demonstrate that NCTL exhibits combinatorial transfer using a more challenging variant of the standard Split MNIST protocol; in Section 5.2, we compare the performance of NCTL to many previous continual learning algorithms across standard Permuted and Split MNIST variants, using the same test and train splits as previously published; in Section 5.3, we further evaluate NCTL on a widely used real-world dataset Electricity (Elec2-3) which exhibits temporal dependencies and distribution drift.
Researcher Affiliation Industry Deep Mind aixi@google.com
Pseudocode No The paper describes the algorithm in prose within Section 4 'Algorithm' but does not provide a formal pseudocode or structured algorithm block within the document.
Open Source Code Yes Code at: github.com/deepmind/deepmind-research/.
Open Datasets Yes Most recent studies on continual learning define a protocol that makes use of an underlying MNIST (or similar) classification dataset. The popular (Disjoint) Split MNIST [ZPG17] involves separating the 10-class classification problem with 5 binary classification tasks... The Electricity (Elec2-3) dataset [HW99] contains 45,312 instances collected from the Australian NSW Electricity Market between May 1997 and December 1999.
Dataset Splits Yes We compare the performance of NCTL to many previous continual learning algorithms across standard Permuted and Split MNIST variants, using the same test and train splits as previously published
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running experiments were mentioned.
Software Dependencies No NCTL was implemented using JAX [BFH+18] and the Deep Mind JAX ecosystem [BHK+20, HCNB20, HBV+20, BHQ+20]. However, specific version numbers for JAX or other libraries are not provided.
Experiment Setup Yes Hyper-parameters are optimized by grid search for both EWC and online EWC: the regularization constant λ is set to 106 and the learning rate is set to 10 5 for EWC; and we have λ = 107, a learning rate of 10 5, and the Fisher information matrix leak term γ (based on the formalism of [SLC+18]) set to 0.8 for online EWC. Our NCTL consisted of 50-25-1 neurons where the base model for each neuron is a GGM with context space C = 24 trained with learning rate 0.001. We adopted the same hyperparameters for the GLN baseline.