A Unified Framework for U-Net Design and Analysis

Authors: Christopher Williams, Fabian Falck, George Deligiannidis, Chris C Holmes, Arnaud Doucet, Saifuddin Syed

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we demonstrate how Multi-Res Nets achieve competitive and often superior performance compared to classical U-Nets in image segmentation, PDE surrogate modelling, and generative modelling with diffusion models. We conduct three main experimental analyses: (A) Multi-Res Nets which feature an encoder with no learnable parameters as an alternative to classical Residual U-Nets, (B) Multi-resolution training and sampling, (C) U-Nets encoding the topological structure of triangular data.
Researcher Affiliation Academia 1University of Oxford 2The Alan Turing Institute {williams,fabian.falck,deligian,cholmes, doucet,saifuddin.syed}@stats.ox.ac.uk
Pseudocode Yes Algorithm 1 Multi-resolution training and sampling via preconditioning. Require: Boolean FREEZE.
Open Source Code Yes We provide our Py Torch code base at https://github.com/Fabian Falck/unet-design. We refer to Appendices B, and D for details on experiments, further experimental results, the datasets, and computational resources used.
Open Datasets Yes As datasets, we use MNIST [47], a custom triangular version of MNIST (MNIST-Triangular) and CIFAR10 [48] for (1), Navier-stokes and Shallow water equations [49] for (2), and the MICCAI 2017 White Matter Hyperintensity (WMH) segmentation challenge dataset [50, 51] for (3).
Dataset Splits Yes We compute the FID score on a holdout dataset not used during training, and using an evaluation model where weights are updated with the training weights using an exponential moving average (as is common practice). The r-MSE is computed as an MSE over pieces of the PDE trajectory against its ground-truth. Each piece is predicted in an autoregressive fashion, where the model receives the previous predicted piece and historic observations as input [6]. All evaluation metrics are in general computed on the test set and averaged over three random seeds after the same number of iterations in each table, if not stated otherwise. ...evaluating on the test set, and on Navier-Stokes for approximately 400 K iterations evaluating on the validation set.
Hardware Specification Yes First, we used two local machines with latest CPU hardware and one with an onboard GPU for development and debugging purposes. Second, we had access to a large compute cluster with A100 GPU nodes and appropriate CPU and RAM hardware.
Software Dependencies Yes Our code base uses the following main existing assets: Weights&Biases [82] (MIT License), Py Torch [83] (custom license), in particular the torchvision package, pytorch_wavelets [84] (MIT License), Py Wavelets [85] (MIT License), pytorch_lightning [86] (Apache License 2.0), matplotlib [87] (TODO), numpy [88] (BSD 3-Clause License), tensorboard [89] (Apache License 2.0), Py Yaml [90] (MIT License), tqdm [91] (MPLv2.0 MIT License), scikit-learn and sklearn [92] (BSD 3-Clause License), and pickle [93] (License N/A).
Experiment Setup Yes We performed little to no hyperparameter tuning in our experiments. In particular, we did not perform a search (e.g. grid search) over hyperparameters. In general, we used the hyperparameters of the original repositories as stated in Appendix D, and changed them only when necessary, for instance to adjust the number of parameters so to enable a fair comparison. For each dataset, we train each resolution for the following number of iterations/epochs: MNIST [5K, 5K, 5K, 5K] iterations, CIFAR10 [50K, 50K, 50K, 450K] iterations, Navier-Stokes [5, 5, 5, 35] epochs, Shallow water [2, 2, 2, 14] epochs.