Translation Equivariant Transformer Neural Processes
Authors: Matthew Ashman, Cristiana Diaconu, Junhyuck Kim, Lakee Sivaraya, Stratis Markou, James Requeima, Wessel P Bruinsma, Richard E. Turner
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through an extensive range of experiments on synthetic and real-world spatio-temporal data, we demonstrate the effectiveness of TE-TNPs relative to their nontranslation-equivariant counterparts and other NP baselines. |
| Researcher Affiliation | Collaboration | 1Department of Engineering, University of Cambridge, Cambridge, UK 2Vector Institute, University of Toronto, Toronto, Canada 3Microsoft Research AI for Science, Cambridge, UK. |
| Pseudocode | No | The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | The paper uses well-known public datasets and data sources: 'MNIST (Le Cun et al., 1998) and CIFAR10', 'derived from ERA5 (Copernicus Climate Change Service, 2020)', and 'data made available by Rozet & Louppe (2023)' for the Kolmogorov flow PDE. |
| Dataset Splits | Yes | For image completion: 'The training set consists of 60,000 images for T-MNIST and 50,000 images for T-CIFAR-10... The test set consists of 10,000 images for both T-MNIST and T-CIFAR-10.' For Kolmogorov flow: 'The overall dataset consists of 1,024 independent trajectories of 64 states, of which 819 are used for training and 102 for testing.' For environmental data: 'Models are trained on measurements within the latitude / longitude range of [42 , 53 ] / [8 , 28 ]... and evaluated on three nonoverlapping regions: the training region, western Europe... and northern Europe.' The paper describes clear train and test splits for these datasets. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments, such as specific GPU models, CPU types, or cloud computing instance details. It mentions 'limitations of the hardware available' but no specifics. |
| Software Dependencies | No | The paper mentions software components like 'Adam W', 'Adam', and 'GPytorch', but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | The paper provides detailed experimental setup information in Section F, including hyperparameter values: 'For all models, we use an embedding / token size of Dz = 128, and decoder consisting of an MLP with two hidden layers of dimension Dz.' and 'optimise the model parameters using Adam W (Loshchilov & Hutter, 2017) with a learning rate of 5 10 4 and batch size of 16. Gradient value magnitudes are clipped at 0.5. We train for a maximum of 500 epochs, with each epoch consisting of 16,000 datasets (10,000 iterations per epoch).' |