Learning to Propagate for Graph Meta-Learning

Authors: LU LIU, Tianyi Zhou, Guodong Long, Jing Jiang, Chengqi Zhang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In experiments, under different training-test discrepancy and test task generation settings, GPN outperforms recent meta-learning methods on two benchmark datasets.
Researcher Affiliation Academia 1Center for Artificial Intelligence, University of Technology Sydney 2Paul G. Allen School of Computer Science & Engineering, University of Washington
Pseudocode Yes Algorithm 1 GPN Training
Open Source Code Yes The code of GPN and dataset generation is available at https://github.com/liulu112601/Gated-Propagation-Net.
Open Datasets Yes We built two datasets with different distance/dissimilarity between test classes and training classes, i.e., tiered Image Net-Close and tiered Image Net-Far. ... We extract two datasets from tiered Image Net [22]... The code of GPN and dataset generation is available at https://github.com/liulu112601/Gated-Propagation-Net.
Dataset Splits Yes The two datasets share the same training tasks and we make sure that there is no overlap between training and test classes. Their difference is at the test classes. In tiered Image Net-Close, the minimal distance between each test class to a training class is 1 4, while the minimal distance goes up to 5 10 in tiered Image Net-Far. The statistics for tiered Image Net-Close and tiered Image Net-Far are reported in Table 2.
Hardware Specification Yes Our model took approximately 27 hours on one TITAN XP for the 5-way-1-shot learning.
Software Dependencies No The paper mentions using Adam optimizer and ResNet-08 backbone, but does not provide specific version numbers for software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version) that would be needed for reproduction.
Experiment Setup Yes The training took τtotal =350k episodes using Adam [12] with an initial learning rate of 10 3 and weight decay 10 5. We reduced the learning rate by a factor of 0.9 every 10k episodes starting from the 20k-th episode. The batch size for the auxiliary task was 128. For simplicity, the propagation steps T = 2. More steps may result in higher performance with the price of more computations. The interval for memory update is m = 3 and the the number of heads is 5 in GPN.