Continual Variational Autoencoder via Continual Generative Knowledge Distillation
Authors: Fei Ye, Adrian G. Bors
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show theoretically and empirically that the proposed framework can train a statistically diversified Teacher module for continual VAE learning which is applicable to learning infinite data streams. and Experiments Settings and Baselines, The FID results for MSFIRC are shown in Table 1, The image reconstruction results by CGKD-GAN are sharper than most of the baselines, as we can observe in Fig. 2. |
| Researcher Affiliation | Academia | Fei Ye and Adrian G. Bors Department of Computer Science, University of York, York YO10 5GH, UK fy689@york.ac.uk, adrian.bors@york.ac.uk |
| Pseudocode | Yes | Algorithm. We summarize the training algorithm in five steps : (1) (Updating the memory buffer Mi). We update Mi at Ti by adding a new batch of samples {xi,j}b i drawn from the data stream S into its buffer if the memory is not full |Mi| < |M|max, otherwise, we remove the earliest batch of samples included in Mi and add {xi,j}b k=1; (2) (Teacher learning). If the Teacher has only a single expert at the initial training phase, we automatically build a new expert A2 at the training step T100 while freezing A1. We train the newly added expert on Mi using either Eq. (8) or (9); (3) (Checking the expansion). To avoid the frequent evaluation, we check the expansion when the memory is full |Mi| = |M|max. When the expansion criterion is satisfied Eq. (1), we add a new expert Ac+1 to the Teacher module while cleaning up the memory Mi; (4) (Expert pruning for KD). We remove the non-essential experts from the Teacher module using the proposed expert pruning approach until the number of experts in A matches n; (5) (Student learning). We distill the data generated by the Teacher to the Student while simultaneously learning the information from the current memory Mi using Eq. (4). Then we return to Step 1 for the next training step Ti+1. |
| Open Source Code | Yes | Supplementary materials (SM) and source code are available1. 1https://github.com/dtuzi123/CGKD |
| Open Datasets | Yes | We consider a series of six data domains including MNIST (Le Cun et al. 1998), SVHN (Netzer et al. 2011), Fashion (Xiao, Rasul, and Vollgraf 2017), IFashion, RMNIST and CIFAR10 (Abouelnaga et al. 2016)., We consider 5000 samples for testing from each database, Celeb A (Liu et al. 2015) and 3D-chair (Aubry et al. 2014) |
| Dataset Splits | No | The paper mentions 'test and training sets' and 'test sets' for evaluation, but does not explicitly describe a 'validation set' or 'validation split' for hyperparameter tuning. |
| Hardware Specification | No | The paper does not specify any details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | The batch size and the number of epochs for each training step are 64 and 1, respectively. The maximum memory size for MSFIRC and CI-MSFIRC is 5000., For the GAN-based Teacher model, we have a discriminator network Dϵ parameterized by ϵ and a generator Gθj parameterized by θj. We consider the WGAN objective function (Gulrajani et al. 2017), For the VAE-based Teacher model, we introduce two neural networks to model the encoding qη(z | x) and decoding distribution pθj(x | z), respectively. |