Attention-based Neural Cellular Automata
Authors: Mattie Tesfaldet, Derek Nowrouzezahrai, Chris Pal
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present quantitative and qualitative results on denoising autoencoding across six benchmark datasets, comparing Vi TCA to a U-Net, a U-Net-based CA baseline (UNet CA), and a Vision Transformer (Vi T). When comparing across architectures configured to similar parameter complexity, Vi TCA architectures yield superior performance across all benchmarks and for nearly every evaluation metric. We present an ablation study on various architectural configurations of Vi TCA, an analysis of its effect on cell states, and an investigation on its inductive biases. |
| Researcher Affiliation | Academia | Mattie Tesfaldet Mc Gill University, Mila Derek Nowrouzezahrai Mc Gill University, Mila Christopher Pal Polytechnique Montréal, Mila |
| Pseudocode | Yes | Alg. 1 in Appendix A details this process. |
| Open Source Code | Yes | Code and instructions to reproduce results are included in the supplemental material. |
| Open Datasets | Yes | We present test set results across six benchmark datasets: a land cover classification dataset intended for representation learning (Land Cover Rep) [25], MNIST [50], Celeb A [47], Fashion MNIST [42], CIFAR10 [53], and Tiny Image Net (a subset of Image Net [49]). |
| Dataset Splits | No | The paper describes a pool sampling-based training process and mentions 'test set results' but does not explicitly provide percentages or counts for training/validation/test splits for the datasets used. |
| Hardware Specification | Yes | In the case of Tiny Image Net, b = 8 to accommodate training on a single GPU (48GB Quadro RTX 8000). |
| Software Dependencies | No | The paper mentions 'Py Torch (BSD-style) and Hydra (MIT)' but does not specify their version numbers. |
| Experiment Setup | Yes | Unless otherwise stated, we train for I =100K iterations, use a minibatch size b=32, Adam W optimizer [36], learning rate =10 3 with a cosine annealing schedule [40], pool size NP = 1024, and cell hidden channel size Ch = 32. We initialize weights/parameters using He initialization [46], except for the final layer of CA-based models, which are initialized to zero [30]. |