Attention-based Neural Cellular Automata

Authors: Mattie Tesfaldet, Derek Nowrouzezahrai, Chris Pal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present quantitative and qualitative results on denoising autoencoding across six benchmark datasets, comparing Vi TCA to a U-Net, a U-Net-based CA baseline (UNet CA), and a Vision Transformer (Vi T). When comparing across architectures configured to similar parameter complexity, Vi TCA architectures yield superior performance across all benchmarks and for nearly every evaluation metric. We present an ablation study on various architectural configurations of Vi TCA, an analysis of its effect on cell states, and an investigation on its inductive biases.
Researcher Affiliation Academia Mattie Tesfaldet Mc Gill University, Mila Derek Nowrouzezahrai Mc Gill University, Mila Christopher Pal Polytechnique Montréal, Mila
Pseudocode Yes Alg. 1 in Appendix A details this process.
Open Source Code Yes Code and instructions to reproduce results are included in the supplemental material.
Open Datasets Yes We present test set results across six benchmark datasets: a land cover classification dataset intended for representation learning (Land Cover Rep) [25], MNIST [50], Celeb A [47], Fashion MNIST [42], CIFAR10 [53], and Tiny Image Net (a subset of Image Net [49]).
Dataset Splits No The paper describes a pool sampling-based training process and mentions 'test set results' but does not explicitly provide percentages or counts for training/validation/test splits for the datasets used.
Hardware Specification Yes In the case of Tiny Image Net, b = 8 to accommodate training on a single GPU (48GB Quadro RTX 8000).
Software Dependencies No The paper mentions 'Py Torch (BSD-style) and Hydra (MIT)' but does not specify their version numbers.
Experiment Setup Yes Unless otherwise stated, we train for I =100K iterations, use a minibatch size b=32, Adam W optimizer [36], learning rate =10 3 with a cosine annealing schedule [40], pool size NP = 1024, and cell hidden channel size Ch = 32. We initialize weights/parameters using He initialization [46], except for the final layer of CA-based models, which are initialized to zero [30].