Adaptive Posterior Learning: few-shot learning with a surprise-based memory module
Authors: Tiago Ramalho, Marta Garnelo
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show this algorithm can perform as well as state of the art baselines on few-shot classification benchmarks with a smaller memory footprint. |
| Researcher Affiliation | Industry | Tiago Ramalho Cogent Labs tramalho@cogent.co.jp Marta Garnelo Deep Mind garnelo@google.com |
| Pseudocode | No | The paper includes architectural diagrams and descriptions of modules, but it does not contain structured pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | 1Source code for the model is available at https://github.com/cogentlabs/apl. |
| Open Datasets | Yes | The Omniglot dataset contains 1623 characters with 20 examples each. |
| Dataset Splits | No | 1200 of the character classes are assigned to the train set while the remaining 423 are part of the test set. This specifies train/test but no distinct validation split for hyperparameter tuning is explicitly mentioned. |
| Hardware Specification | No | The paper mentions using a 'pretrained Inception-Res Net-v2... due to computational constraints', but it does not specify any particular hardware details such as GPU models, CPU types, or cloud computing resources used for the experiments. |
| Software Dependencies | No | The paper refers to using the 'Adam optimizer' and mentions convolutional networks with 'Batch Normalization' and 'ReLU activation', but it does not provide specific version numbers for any programming languages, libraries, or frameworks used (e.g., Python version, TensorFlow/PyTorch version). |
| Experiment Setup | Yes | For all experiments below we use the same training setup. For each training episode we sample elements from N classes and randomly shuffle them to create training batches. For every batch shown to the model, we do one step of gradient descent with the Adam optimizer. We anneal the learning rate from 10 4 to 10 5 with exponential decay over 1000 steps (decay rate 0.9). |