Multilinear Operator Networks

Authors: Yixin Cheng, Grigorios Chrysos, Markos Georgopoulos, Volkan Cevher

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct a thorough experimental validation of MONet. We conduct experiments on large-scale image classification in Section 4.1, fine-grained and small-scale image classification in Section 4.2. In addition, we exhibit a unique advantage of models without activation functions to learn dynamic systems in scientific computing in Section 4.3. Lastly, we validate the robustness of our model to diverse perturbations in Section 4.4. We summarize four configurations of the proposed MONet in Table 1 with different versions of MONet. We present a schematic for our configuration in Appendix B. Due to the limited space, we conduct additional experiments in the Appendix. Concretely, we experiment on semantic segmentation in Appendix H, while we add additional ablations and experimental details on Appendices K to M.
Researcher Affiliation Academia Yixin Cheng1, Grigorios G. Chrysos2, Markos Georgopoulos, Volkan Cevher1 1LIONS Ecole Polytechnique F ed erale de Lausanne 2University of Wisconsin-Madison
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes The source code is available at MONet. We hope the code can enable further improvement of models relying on linear projections. Our plan is to make the source code of our model open source once our work gets accepted.
Open Datasets Yes Image Net1K, which is the standard benchmark for image classification, contains 1.2M images with 1,000 categories annotated. Beyond Image Net1K, we experiment with a number of additional benchmarks to further assess MONet. We use the standard datasets of CIFAR10 (Krizhevsky et al., 2009), SVHN (Netzer et al., 2011) and Tiny Image Net1K (Le & Yang, 2015) for image recognition. A fine-grained classification experiment on Oxford Flower102 (Nilsback & Zisserman, 2008) is conducted. We further conduct experiments on Image Net-C (Hendrycks & Dietterich, 2019) to analyze the robustness of our model. We conduct an ablation study using Image Net100, a subset of Image Net1K with 100 classes selected from the original ones. To assess the efficacy of our model beyond natural images, we conduct an experiment on the Med MNIST challenge across ten datasets (Yang et al., 2021).
Dataset Splits Yes Image Net1K, which is the standard benchmark for image classification, contains 1.2M images with 1,000 categories annotated. We consider a host of baseline models to compare the performance of MONet. Concretely, we include strong-performing polynomial models5 (Chrysos et al., 2020; 2023), MLP-like models (Tolstikhin et al., 2021; Touvron et al., 2022; Yu et al., 2022b), models based on vanilla Transformer (Vaswani et al., 2017; Touvron et al., 2021) and several classic convolutional networks (Bello et al., 2019; Chen et al., 2018b; Simonyan & Zisserman, 2015; He et al., 2016). We train our model using Adam W optimizer (Loshchilov & Hutter, 2019). We use a batch size of 448 per GPU to fully leverage the memory capacity of the GPU. We use a linear warmup and cosine decay schedule learning rate, while the initial learning rate is 1e-4, linear increase to 1e-3 in 10 epochs and then gradually drops to 1e-5 in 300 epochs. We use label smoothing (Szegedy et al., 2016), standard data augmentation strategies, such as Cut-Mix (Yun et al., 2019), Mix-up (Zhang et al., 2018) and auto-augment (Cubuk et al., 2019), which are used in similar methods (Tolstikhin et al., 2021; Trockman & Kolter, 2023; Touvron et al., 2022). Our data augmentation recipe follows the one used in MLP-Mixer (Tolstikhin et al., 2021). We do not use any external data for training. We train our model using native Py Torch training on 4 NVIDIA A100 GPUs.
Hardware Specification Yes We train our model using native Py Torch training on 4 NVIDIA A100 GPUs.
Software Dependencies No The paper mentions "native Py Torch training" and uses the "timm library" but does not specify exact version numbers for these software dependencies or any other specific libraries.
Experiment Setup Yes We train our model using Adam W optimizer (Loshchilov & Hutter, 2019). We use a batch size of 448 per GPU to fully leverage the memory capacity of the GPU. We use a linear warmup and cosine decay schedule learning rate, while the initial learning rate is 1e-4, linear increase to 1e-3 in 10 epochs and then gradually drops to 1e-5 in 300 epochs. We use label smoothing (Szegedy et al., 2016), standard data augmentation strategies, such as Cut-Mix (Yun et al., 2019), Mix-up (Zhang et al., 2018) and auto-augment (Cubuk et al., 2019), which are used in similar methods (Tolstikhin et al., 2021; Trockman & Kolter, 2023; Touvron et al., 2022). Our data augmentation recipe follows the one used in MLP-Mixer (Tolstikhin et al., 2021).