Multiscale Deep Equilibrium Models

Authors: Shaojie Bai, Vladlen Koltun, J. Zico Kolter

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the effectiveness of this approach on two large-scale vision tasks: Image Net classification and semantic segmentation on high-resolution images from the Cityscapes dataset. In both settings, MDEQs are able to match or exceed the performance of recent competitive computer vision models: the first time such performance and scale have been achieved by an implicit deep learning approach. We demonstrate the effectiveness of MDEQ via extensive experiments on large-scale image classification and semantic segmentation datasets.
Researcher Affiliation Collaboration Shaojie Bai Carnegie Mellon University Vladlen Koltun Intel Labs J. Zico Kolter Carnegie Mellon University Bosch Center for AI
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code and pre-trained models are at tt s t s q.
Open Datasets Yes We illustrate the effectiveness of this approach on two large-scale vision tasks: Image Net classification and semantic segmentation on high-resolution images from the Cityscapes dataset. Following the setting of Dupont et al. [18], we run the experiments on CIFAR-10 classification (without data augmentation) for 50 epochs and compare models with approximately the same number of parameters.
Dataset Splits Yes Our largest MDEQ surpasses 80% m Io U on the Cityscapes validation set, outperforming strong convolutional networks and coming tantalizingly close to the state of the art. We train on the Cityscapes tr set and evaluate on the set. Following the evaluation protocol of Zhao et al. [65] and Wang et al. [57], we test on a single scale with no flipping.
Hardware Specification Yes All computation speeds are benchmarked relative to the Res Net-101 model (about 150ms per batch) on a single RTX 2080 Ti GPU.
Software Dependencies No The paper mentions deep learning frameworks like Tensor Flow implicitly but does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup Yes We use the cosine learning rate schedule for all tasks [41]. We run the experiments on CIFAR-10 classification (without data augmentation) for 50 epochs. MDEQs were trained for 100 epochs (for ImageNet). All experiments with MDEQs use the limited-memory version of Broyden s method in both forward and backward passes, and the root solvers are stopped whenever 1) the objective value reaches some predetermined threshold ε or 2) the solver s iteration count reaches a limit T. We initialize the internal states by setting z[0] i = 0 for all scales i. We therefore adopt variational dropout [22] and apply the exact same mask at all invocations of fθ in a given training iteration. We replace the last Re LU in both the residual block and the multiscale fusion by a softplus [23] in the initial phase of training. On large-scale vision benchmarks (Image Net and Cityscapes), we downsample the input twice with 2-strided convolutions before feeding it into MDEQs. All computation speeds are benchmarked... with input batch size 32.