Memory Efficient Neural Processes via Constant Memory Attention Block
Authors: Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Yoshua Bengio, Mohamed Osama Ahmed
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show CMANPs achieve state-of-the-art results on popular NP benchmarks while being significantly more memory efficient than prior methods. |
| Researcher Affiliation | Collaboration | 1Mila Universit e de Montr eal, Canada 2Borealis AI, Canada. |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code: https://github.com/Borealis AI/constant-memory-anp. |
| Open Datasets | Yes | EMNIST (Cohen et al., 2017) comprises black and white images of handwritten letters of 32 32 resolution. |
| Dataset Splits | No | For each task, a randomly selected set of pixels are selected as context data points and target data points. N is a fixed number of context data points and M is a fixed number of target data points. The model is adapted using the context dataset. Afterwards, the target dataset is used to evaluate the effectiveness of the adaptation and adjust the adaptation rule accordingly. While the paper describes how context and target data points are sampled for each task, it does not provide explicit train/validation/test dataset splits for the overall datasets used. |
| Hardware Specification | Yes | All experiments were run on a Nvidia GTX 1080 Ti (12 GB) or Nvidia Tesla P100 (16 GB) GPU. |
| Software Dependencies | No | The paper mentions using implementations from official repositories of TNPs and LBANPs and cholesky decomposition, but does not provide specific version numbers for any software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | For consistency, we set the number of latents (i.e., bottleneck size) |LI| = |LB| = 128 across all experiments. We also set b Q = 5. ... We used an ADAM optimizer with a standard learning rate of 5e 4. We performed a grid search over the weight decay term {0.0, 0.00001, 0.0001, 0.001}. ... The block size for CMANP-AND is set as b Q = 5. During training, Celeb A (128x128), (64x64), and (32x32) used a mini-batch size of 25, 50, and 100 respectively. |