Uncertainty-guided Continual Learning with Bayesian Neural Networks

Authors: Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our UCB approach extensively on diverse object classification datasets with short and long sequences of tasks and report superior or on-par performance compared to existing approaches. Additionally, we show that our model does not necessarily need task information at test time, i.e. it does not presume knowledge of which task a sample belongs to.
Researcher Affiliation Collaboration Sayna Ebrahimi UC Berkeley Mohamed Elhoseiny KAUST, Stanford University Trevor Darrell UC Berkeley Marcus Rohrbach Facebook AI Research
Pseudocode Yes Algorithm 1 Uncertainty-guided Continual Learning with Bayesian Neural Networks UCB; Algorithm 2 Learning Rate Update in UCB; Algorithm 3 Learning Rate Update in UCB-P
Open Source Code Yes Our code is available at https://github.com/Sayna Ebrahimi/UCB.
Open Datasets Yes We use Split MNIST with 5 tasks (5-Split MNIST) similar to (Nguyen et al., 2018; Chen et al., 2019; Tseran et al., 2018) and permuted MNIST (Srivastava et al., 2013) for class incremental learning with similar experimental settings as used in (Serra et al., 2018; Tseran et al., 2018). Furthermore, to have a better understanding of our method, we evaluate our approach on continually learning a sequence of 8 datasets with different distributions using the identical sequence as in (Serra et al., 2018), which includes Face Scrub (Ng & Winkler, 2014), MNIST, CIFAR100, Not MNIST (Bulatov, 2011), SVHN (Netzer et al., 2011), CIFAR10, Traffic Signs (Stallkamp et al., 2011), and Fashion MNIST (Xiao et al., 2017).
Dataset Splits Yes In all our experiments we consider a 0.15 split for the validation set on the first two tasks.
Hardware Specification No The paper does not provide specific details on the hardware used, such as GPU models, CPU types, or memory specifications. It only mentions general training settings like batch size and learning rate.
Software Dependencies No The paper mentions 'Adam optimizer (Kingma & Ba, 2014)' but does not provide version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup Yes We used stochastic gradient descent with a batch size of 64 and a learning rate of 0.01, decaying it by a factor of 0.3 once the loss plateaued. Dataset splits and batch shuffle are identically in all UCB experiments and all baselines. [...] Table 5 shows the search space for hyperparamters in the BBB algorithm Blundell et al. (2015) which we used for tuning on the validation set of the first two tasks. [...] Following the result in Table 7 we chose the number of samples to be 10 for all experiments.