Wandering within a world: Online contextualized few-shot learning
Authors: Mengye Ren, Michael Louis Iuzzolino, Michael Curtis Mozer, Richard Zemel
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we show experimental results for our online contextualized few-shot learning paradigm, using Roaming Omniglot and Roaming Rooms (see Sec. 3) to evaluate our model, CPM, and other state-of-the-art methods. |
| Researcher Affiliation | Collaboration | Mengye Ren1,3 Michael L. Iuzzolino2 Michael C. Mozer2,4 Richard S. Zemel1,3,5 1University of Toronto 2Google Research 3Vector Institute 4University of Colorado, Boulder 5CIFAR |
| Pseudocode | No | The paper describes the model (Contextual Prototypical Memory Networks) using text and mathematical equations, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and dataset are released at: https://github.com/renmengye/oc-fewshot-public |
| Open Datasets | Yes | Third, we build three datasets: 1) Roaming Omniglot is based on handwritten characters from Omniglot (Lake et al., 2015); 2) Roaming Image Net is based on images from Image Net (Russakovsky et al., 2015); and 3) Roaming Rooms is our new few-shot learning dataset based on indoor imagery (Chang et al., 2017), which resembles the visual experience of a wandering agent. |
| Dataset Splits | Yes | We split the alphabets into 31 for training, 5 for validation, and 13 for testing. |
| Hardware Specification | No | The paper mentions training across multiple GPUs ('across 2 GPUs', 'across 4 GPUs') but does not specify the model or type of these GPUs, nor does it detail any other specific hardware components like CPU models, memory, or accelerator types. |
| Software Dependencies | No | The paper mentions using 'the Adam optimizer (Kingma & Ba, 2015)' but does not specify version numbers for any software dependencies, such as programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or specific library versions. |
| Experiment Setup | Yes | Implementation details: For Roaming Omniglot, we use the common 4-layer CNN for few-shot learning with 64 channels in each layer. For Roaming Image Net, we also use Res Net-12 with input resolution 84x84 (Oreshkin et al., 2018). For the Roaming Rooms, we resize the input to 120x160 and use Res Net-12. ... For the contextual RNN, in both experiments we used an LSTM (Hochreiter & Schmidhuber, 1997) with a 256d hidden state. The best CPM model is equipped using GAU and cosine similarity for querying prototypes. Logits based on cosine similarity are multiplied with a learned scalar initialized at 10.0 (Oreshkin et al., 2018). We use the Adam optimizer (Kingma & Ba, 2015) for all of our experiments, with a gradient cap of 5.0. For Roaming Omniglot we train the network for 40k steps with a batch size 32 and maximum sequence length 150 across 2 GPUs and an initial learning rate 2e-3 decayed by 0.1 at 20k and 30k steps. For Roaming Rooms we train for 20k steps with a batch size 8 and maximum sequence length 100 across 4 GPUs and an initial learning rate 1e-3 decayed by 0.1 at 8k and 16k steps. We use the BCE coefficient λ = 1 for all experiments. |