Attentional Constellation Nets for Few-Shot Learning
Authors: Weijian Xu, yifan xu, Huaijin Wang, Zhuowen Tu
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach attains a significant improvement over the existing methods in few-shot learning on the CIFAR-FS, FC100, and mini-Image Net benchmarks. [...] We demonstrate the effectiveness of our approach on standard few-shot benchmarks, including FC100 (Oreshkin et al., 2018), CIFAR-FS (Bertinetto et al., 2018) and mini-Image Net (Vinyals et al., 2016) by showing a significant improvement over the existing methods. An ablation study also demonstrates the effectiveness of Constellation Net is not achieved by simply increasing the model complexity [...] 5 EXPERIMENT 5.1 DATASETS [...] 5.3 RESULTS ON STANDARD BENCHMARKS Table 1 and 2 summarize the results of the few-shot classification tasks on CIFAR-FS, FC100, and mini-Image Net, respectively. Our method shows a notable improvement over several strong baselines in various settings. |
| Researcher Affiliation | Collaboration | Weijian Xu 1, Yifan Xu 1, Huaijin Wang 1 & Zhuowen Tu1,2 University of California San Diego1, Amazon Web Services2 {wex041,yix081,huw011,ztu}@ucsd.edu |
| Pseudocode | Yes | Inspired by Sculley (2010), we design a mini-batch soft k-means algorithm to cluster the cell features approximately: Initialization. Randomly initialize global cluster centers V = {v1, v2, ..., v K} and a counter s = (s1, s2, ..., s K) = 0. Cluster Assignment. In forward step, given input cell features U = {u1, u2, ..., un}, we compute the distance vector di = (di1, di2, ...di K) between input cell feature ui and all cluster centers V. We then compute the soft assignment mik R and generate the current mini-batch centers v k: dik = ||ui vk||2 2, mik = e βdik P j e βdij , v k = P Centroid Movement. We formulate a count update s = P i mi by summing all assignment maps mi = (mi1, mi2, ...mi K). The current mini-batch centers v k are then updated to the global centers vk with a momentum coefficient η: vk (1 η)vk + ηv k, η = λ sk + sk (5) Counter Update. Counter s is updated and distance vectors {di} are reshaped and returned: s s + s (6) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We adopt three standard benchmark datasets that are widely used in few-shot learning, CIFAR-FS dataset (Bertinetto et al., 2018), FC100 dataset (Oreshkin et al., 2018), and mini-Image Net dataset (Vinyals et al., 2016). Details about dataset settings in few-shot learning are in Appendix A.2. [...] The CIFAR-FS dataset (Bertinetto et al., 2018) is a few-shot classification benchmark containing 100 classes from CIFAR-100 (Krizhevsky et al., 2009). [...] The FC100 dataset (Oreshkin et al., 2018) is another benchmark based on CIFAR-100 [...] The mini-Image Net dataset (Vinyals et al., 2016) is a common benchmark for few-shot classification containing 100 classes from ILSVRC2012 (Deng et al., 2009). |
| Dataset Splits | Yes | The CIFAR-FS dataset (Bertinetto et al., 2018) is a few-shot classification benchmark containing 100 classes from CIFAR-100 (Krizhevsky et al., 2009). The classes are randomly split into 64, 16 and 20 classes as meta-training, meta-validation and meta-testing set respectively. [...] The mini-Image Net dataset (Vinyals et al., 2016) is a common benchmark for few-shot classification containing 100 classes from ILSVRC2012 (Deng et al., 2009). The classes are randomly split into 64, 16 and 20 classes as meta-training, meta-validation and meta-testing set respectively. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using an 'SGD optimizer' and following 'implementation in Lee et al. (2019)' but does not specify any software libraries or frameworks with version numbers (e.g., PyTorch, TensorFlow, specific Python versions). |
| Experiment Setup | Yes | Optimization Settings. We follow implementation in Lee et al. (2019), and use SGD optimizer with initial learning rate of 1, and set momentum to 0.9 and weight decay rate to 5 10 4. The learning rate reduces to 0.06, 0.012, and 0.0024 at epoch 20, 40 and 50. The inverse temperature β is set to 100.0 in the cluster assignment step, and λ is set to 1.0 in the centroid movement step. |