Task-Free Dynamic Sparse Vision Transformer for Continual Learning

Authors: Fei Ye, Adrian G. Bors

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical studies demonstrate the effectiveness of TFDSVi T. The code and supplementary material are available at https://github.com/dtuzi123/TFDSVi T. and Extensive experiments demonstrate that the proposed TFDSVi T far outperforms other methods with less computation or memory costs.
Researcher Affiliation Academia Fei Ye1,2 and Adrian G. Bors1,2 1Department of Computer Science, University of York, York YO10 5GH, UK 2Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE fy689@york.ac.uk, adrian.bors@york.ac.uk
Pseudocode Yes Algorithm 1: Training algorithm for TFDSVi T
Open Source Code Yes The code and supplementary material are available at https://github.com/dtuzi123/TFDSVi T.
Open Datasets Yes Datasets : We split MNIST (Le Cun et al. 1998) containing 60k training samples into five sets and each set has images of two incremental classes (De Lange and Tuytelaars 2021), and call this setting Split MNIST. Similarly, we divide CIFAR10 (Krizhevsky and Hinton 2009) into five sets where each set consists of images from two consecutively ordered classes, named Split CIFAR10. We also split CIFAR100 (Krizhevsky and Hinton 2009) into 20 sets with each set containing images from five incremental classes.
Dataset Splits No The paper describes a continual learning setting where data arrives in batches from a stream and is processed with a memory buffer, rather than using explicit train/validation/test splits with percentages or sample counts. No explicit validation split is mentioned.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, or memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies or version numbers (e.g., Python, PyTorch, TensorFlow versions) used for the experiments.
Experiment Setup Yes Hyperparameters and implementation. We set the image patch size of 7 7 for Split MNIST. The embedding dimension for Split MNIST is 100. A simple fully connected layer with 100 hidden units implements the MLP module for each submodel. We also implement the encoder and decoder of each autoencoder by using two fully connected layers, with 200 hidden units on each layer. For Split CIFAR10 and Split CIFAR100, we set the image patch size of 8 8 and the embedding dimension as 100. The MLP for each submodel is implemented by two fully connected layers with 500 and 200 hidden units. For all datasets, we consider that each expert has six self-attention blocks m = 6 and that we can build two submodels u = 2. Additional information is provided in Appendix-C from SM.