reproducibilityindex.ai

Discrete Key-Value Bottleneck

Authors: Frederik Träuble, Anirudh Goyal, Nasim Rahaman, Michael Curtis Mozer, Kenji Kawaguchi, Yoshua Bengio, Bernhard Schölkopf

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically verify the proposed method under challenging class-incremental learning scenarios and show that the proposed model without any task boundaries reduces catastrophic forgetting across a wide variety of pre-trained models, outperforming relevant baselines on this task.
Researcher Affiliation	Collaboration	1MPI for Intelligent Systems, Tübingen 2Google Deep Mind 3Mila 4Google Research, Brain Team 5National University of Singapore 6Université de Montréal 7CIFAR Fellow. Correspondence to: Frederik Träuble <frederik.traeuble@tuebingen.mpg.de>.
Pseudocode	No	The paper does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	The code for reproducing all experiments can be found in the supplementary material.
Open Datasets	Yes	We use a class-incremental CIFAR10 task with pre-training on another dataset... We initialize keys on the unlabelled non-overlapping CIFAR100 dataset except for the Conv Mixer where we used the embeddings from the downsampled Imagenet dataset for reasons of comparison.
Dataset Splits	Yes	We use a class-incremental CIFAR10 task... Here, five disjoint sets with two classes each are incrementally presented for many epochs each... Using 50 splits, each consisting of 2 classes, we tested our model on the Res Net-50 and CLIP-Backbone architectures... We evaluated our model on 500 splits, each comprising 2 classes, using the CLIP encoder.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used for running its experiments (e.g., specific GPU/CPU models, memory, or cloud instance types).
Software Dependencies	No	The paper mentions using PyTorch and various pre-trained models from specific repositories but does not list specific version numbers for PyTorch or other key software libraries needed for reproducibility.
Experiment Setup	Yes	Key-Value Bottleneck: The key-value bottleneck consists of 256 key-value codebooks that have each 4096 key-value pairs per codebook. Keys are of the same dimension as the embedding heads (dkey), and we chose the value codes to be of the same size as the classes to predict, i.e. dvalue = 10... After initializing the keys for 10 epochs... we train the model on the class-incremental CIFAR10 task with the SGD Py Torch optimizer without any weight decay or momentum and a learning rate of lr = 0.3 for the bottleneck and lr = 0.001 for the linear probe. We used a label smoothing parameter of 0.1... We used a batch size of 256 during key initialization and continual learning.