ProtoDiff: Learning to Learn Prototypical Networks by Task-Guided Diffusion
Authors: Yingjun Du, Zehao Xiao, Shengcai Liao, Cees Snoek
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct thorough ablation studies to demonstrate its ability to accurately capture the underlying prototype distribution and enhance generalization. The new state-of-the-art performance on within-domain, cross-domain, and few-task few-shot classification further substantiates the benefit of Proto Diff. |
| Researcher Affiliation | Collaboration | 1AIM Lab, University of Amsterdam 2Inception Institute of Artificial Intelligence |
| Pseudocode | Yes | A Algorithms We describe the detailed algorith ms for meta-training and meta-test of Proto Diff as following Algorithm 1 and 2, respectively: |
| Open Source Code | Yes | Code available at: https://github.com/YDU-uva/Proto Diff. |
| Open Datasets | Yes | For the within-domain few-shot learning experiments, we apply our method to three specific datasets: mini Imagenet [50], tiered Imagenet [35], and Image Net-800 [5]. ... Regarding cross-domain few-shot learning, we utilize mini Imagenet [50] as the training domain, while testing is conducted on four distinct domains: Crop Disease [30], Euro SAT [15], ISIC2018 [47], and Chest X [52]. |
| Dataset Splits | No | The paper describes 'meta-training data' and 'meta-test set' classes, and the episodic training setup involves support and query sets. However, it does not provide explicit training/validation/test dataset splits in terms of percentages, fixed sample counts, or specific predefined dataset partitions for a traditional validation set. |
| Hardware Specification | Yes | All experiments are performed on a single A100 GPU, each taking approximately 20 hours. |
| Software Dependencies | No | The paper mentions using optimizers (SGD) and models (GPT-2 architecture) but does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries. It states 'The configuration files can be accessed in our code repository for more detailed parameter setup,' but these details are not provided within the paper itself. |
| Experiment Setup | Yes | In our within-domain experiments, we utilize a Conv-4 and Res Net-12 backbone for mini Imagenet and tiered Imagenet. A Res Net-50 is used for Image Net-800. ... We use the SGD optimizer with a momentum of 0.9, a learning rate starting from 0.1, and a decay factor of 0.1. For mini Imagenet, we train for 100 epochs with a batch size of 128, where the learning rate decays at epoch 90. For tiered Image Net, we train for 120 epochs with a batch size of 512, where the learning rate decays at epochs 40 and 80. Lastly, for Image Net-800, we train for 90 epochs with a batch size of 256, where the learning rate decays at epochs 30 and 60. The weight decay is 0.0005 for Res Net-12 and 0.0001 for Res Net-50. Standard data augmentation techniques, including random resized crop and horizontal flip, are applied. For episodic training, we use the SGD optimizer with a momentum of 0.9, a fixed learning rate of 0.001, and a batch size of 4, meaning each training batch consists of 4 few-shot tasks to calculate the average loss. |