reproducibilityindex.ai

Multiscale Deep Equilibrium Models

Authors: Shaojie Bai, Vladlen Koltun, J. Zico Kolter

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the effectiveness of this approach on two large-scale vision tasks: Image Net classiﬁcation and semantic segmentation on high-resolution images from the Cityscapes dataset. In both settings, MDEQs are able to match or exceed the performance of recent competitive computer vision models: the ﬁrst time such performance and scale have been achieved by an implicit deep learning approach. We demonstrate the effectiveness of MDEQ via extensive experiments on large-scale image classiﬁcation and semantic segmentation datasets.
Researcher Affiliation	Collaboration	Shaojie Bai Carnegie Mellon University Vladlen Koltun Intel Labs J. Zico Kolter Carnegie Mellon University Bosch Center for AI
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code and pre-trained models are at tt s t s q.
Open Datasets	Yes	We illustrate the effectiveness of this approach on two large-scale vision tasks: Image Net classiﬁcation and semantic segmentation on high-resolution images from the Cityscapes dataset. Following the setting of Dupont et al. [18], we run the experiments on CIFAR-10 classiﬁcation (without data augmentation) for 50 epochs and compare models with approximately the same number of parameters.
Dataset Splits	Yes	Our largest MDEQ surpasses 80% m Io U on the Cityscapes validation set, outperforming strong convolutional networks and coming tantalizingly close to the state of the art. We train on the Cityscapes tr set and evaluate on the set. Following the evaluation protocol of Zhao et al. [65] and Wang et al. [57], we test on a single scale with no ﬂipping.
Hardware Specification	Yes	All computation speeds are benchmarked relative to the Res Net-101 model (about 150ms per batch) on a single RTX 2080 Ti GPU.
Software Dependencies	No	The paper mentions deep learning frameworks like Tensor Flow implicitly but does not provide specific version numbers for any software dependencies or libraries used.
Experiment Setup	Yes	We use the cosine learning rate schedule for all tasks [41]. We run the experiments on CIFAR-10 classiﬁcation (without data augmentation) for 50 epochs. MDEQs were trained for 100 epochs (for ImageNet). All experiments with MDEQs use the limited-memory version of Broyden s method in both forward and backward passes, and the root solvers are stopped whenever 1) the objective value reaches some predetermined threshold ε or 2) the solver s iteration count reaches a limit T. We initialize the internal states by setting z[0] i = 0 for all scales i. We therefore adopt variational dropout [22] and apply the exact same mask at all invocations of fθ in a given training iteration. We replace the last Re LU in both the residual block and the multiscale fusion by a softplus [23] in the initial phase of training. On large-scale vision benchmarks (Image Net and Cityscapes), we downsample the input twice with 2-strided convolutions before feeding it into MDEQs. All computation speeds are benchmarked... with input batch size 32.