Discrete Key-Value Bottleneck
Authors: Frederik Träuble, Anirudh Goyal, Nasim Rahaman, Michael Curtis Mozer, Kenji Kawaguchi, Yoshua Bengio, Bernhard Schölkopf
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify the proposed method under challenging class-incremental learning scenarios and show that the proposed model without any task boundaries reduces catastrophic forgetting across a wide variety of pre-trained models, outperforming relevant baselines on this task. |
| Researcher Affiliation | Collaboration | 1MPI for Intelligent Systems, Tübingen 2Google Deep Mind 3Mila 4Google Research, Brain Team 5National University of Singapore 6Université de Montréal 7CIFAR Fellow. Correspondence to: Frederik Träuble <frederik.traeuble@tuebingen.mpg.de>. |
| Pseudocode | No | The paper does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | The code for reproducing all experiments can be found in the supplementary material. |
| Open Datasets | Yes | We use a class-incremental CIFAR10 task with pre-training on another dataset... We initialize keys on the unlabelled non-overlapping CIFAR100 dataset except for the Conv Mixer where we used the embeddings from the downsampled Imagenet dataset for reasons of comparison. |
| Dataset Splits | Yes | We use a class-incremental CIFAR10 task... Here, five disjoint sets with two classes each are incrementally presented for many epochs each... Using 50 splits, each consisting of 2 classes, we tested our model on the Res Net-50 and CLIP-Backbone architectures... We evaluated our model on 500 splits, each comprising 2 classes, using the CLIP encoder. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running its experiments (e.g., specific GPU/CPU models, memory, or cloud instance types). |
| Software Dependencies | No | The paper mentions using PyTorch and various pre-trained models from specific repositories but does not list specific version numbers for PyTorch or other key software libraries needed for reproducibility. |
| Experiment Setup | Yes | Key-Value Bottleneck: The key-value bottleneck consists of 256 key-value codebooks that have each 4096 key-value pairs per codebook. Keys are of the same dimension as the embedding heads (dkey), and we chose the value codes to be of the same size as the classes to predict, i.e. dvalue = 10... After initializing the keys for 10 epochs... we train the model on the class-incremental CIFAR10 task with the SGD Py Torch optimizer without any weight decay or momentum and a learning rate of lr = 0.3 for the bottleneck and lr = 0.001 for the linear probe. We used a label smoothing parameter of 0.1... We used a batch size of 256 during key initialization and continual learning. |