Uncertainty-guided Continual Learning with Bayesian Neural Networks
Authors: Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, Marcus Rohrbach
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our UCB approach extensively on diverse object classification datasets with short and long sequences of tasks and report superior or on-par performance compared to existing approaches. Additionally, we show that our model does not necessarily need task information at test time, i.e. it does not presume knowledge of which task a sample belongs to. |
| Researcher Affiliation | Collaboration | Sayna Ebrahimi UC Berkeley Mohamed Elhoseiny KAUST, Stanford University Trevor Darrell UC Berkeley Marcus Rohrbach Facebook AI Research |
| Pseudocode | Yes | Algorithm 1 Uncertainty-guided Continual Learning with Bayesian Neural Networks UCB; Algorithm 2 Learning Rate Update in UCB; Algorithm 3 Learning Rate Update in UCB-P |
| Open Source Code | Yes | Our code is available at https://github.com/Sayna Ebrahimi/UCB. |
| Open Datasets | Yes | We use Split MNIST with 5 tasks (5-Split MNIST) similar to (Nguyen et al., 2018; Chen et al., 2019; Tseran et al., 2018) and permuted MNIST (Srivastava et al., 2013) for class incremental learning with similar experimental settings as used in (Serra et al., 2018; Tseran et al., 2018). Furthermore, to have a better understanding of our method, we evaluate our approach on continually learning a sequence of 8 datasets with different distributions using the identical sequence as in (Serra et al., 2018), which includes Face Scrub (Ng & Winkler, 2014), MNIST, CIFAR100, Not MNIST (Bulatov, 2011), SVHN (Netzer et al., 2011), CIFAR10, Traffic Signs (Stallkamp et al., 2011), and Fashion MNIST (Xiao et al., 2017). |
| Dataset Splits | Yes | In all our experiments we consider a 0.15 split for the validation set on the first two tasks. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used, such as GPU models, CPU types, or memory specifications. It only mentions general training settings like batch size and learning rate. |
| Software Dependencies | No | The paper mentions 'Adam optimizer (Kingma & Ba, 2014)' but does not provide version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | We used stochastic gradient descent with a batch size of 64 and a learning rate of 0.01, decaying it by a factor of 0.3 once the loss plateaued. Dataset splits and batch shuffle are identically in all UCB experiments and all baselines. [...] Table 5 shows the search space for hyperparamters in the BBB algorithm Blundell et al. (2015) which we used for tuning on the validation set of the first two tasks. [...] Following the result in Table 7 we chose the number of samples to be 10 for all experiments. |