reproducibilityindex.ai

A Unified Framework for U-Net Design and Analysis

Authors: Christopher Williams, Fabian Falck, George Deligiannidis, Chris C Holmes, Arnaud Doucet, Saifuddin Syed

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we demonstrate how Multi-Res Nets achieve competitive and often superior performance compared to classical U-Nets in image segmentation, PDE surrogate modelling, and generative modelling with diffusion models. We conduct three main experimental analyses: (A) Multi-Res Nets which feature an encoder with no learnable parameters as an alternative to classical Residual U-Nets, (B) Multi-resolution training and sampling, (C) U-Nets encoding the topological structure of triangular data.
Researcher Affiliation	Academia	1University of Oxford 2The Alan Turing Institute {williams,fabian.falck,deligian,cholmes, doucet,saifuddin.syed}@stats.ox.ac.uk
Pseudocode	Yes	Algorithm 1 Multi-resolution training and sampling via preconditioning. Require: Boolean FREEZE.
Open Source Code	Yes	We provide our Py Torch code base at https://github.com/Fabian Falck/unet-design. We refer to Appendices B, and D for details on experiments, further experimental results, the datasets, and computational resources used.
Open Datasets	Yes	As datasets, we use MNIST [47], a custom triangular version of MNIST (MNIST-Triangular) and CIFAR10 [48] for (1), Navier-stokes and Shallow water equations [49] for (2), and the MICCAI 2017 White Matter Hyperintensity (WMH) segmentation challenge dataset [50, 51] for (3).
Dataset Splits	Yes	We compute the FID score on a holdout dataset not used during training, and using an evaluation model where weights are updated with the training weights using an exponential moving average (as is common practice). The r-MSE is computed as an MSE over pieces of the PDE trajectory against its ground-truth. Each piece is predicted in an autoregressive fashion, where the model receives the previous predicted piece and historic observations as input [6]. All evaluation metrics are in general computed on the test set and averaged over three random seeds after the same number of iterations in each table, if not stated otherwise. ...evaluating on the test set, and on Navier-Stokes for approximately 400 K iterations evaluating on the validation set.
Hardware Specification	Yes	First, we used two local machines with latest CPU hardware and one with an onboard GPU for development and debugging purposes. Second, we had access to a large compute cluster with A100 GPU nodes and appropriate CPU and RAM hardware.
Software Dependencies	Yes	Our code base uses the following main existing assets: Weights&Biases [82] (MIT License), Py Torch [83] (custom license), in particular the torchvision package, pytorch_wavelets [84] (MIT License), Py Wavelets [85] (MIT License), pytorch_lightning [86] (Apache License 2.0), matplotlib [87] (TODO), numpy [88] (BSD 3-Clause License), tensorboard [89] (Apache License 2.0), Py Yaml [90] (MIT License), tqdm [91] (MPLv2.0 MIT License), scikit-learn and sklearn [92] (BSD 3-Clause License), and pickle [93] (License N/A).
Experiment Setup	Yes	We performed little to no hyperparameter tuning in our experiments. In particular, we did not perform a search (e.g. grid search) over hyperparameters. In general, we used the hyperparameters of the original repositories as stated in Appendix D, and changed them only when necessary, for instance to adjust the number of parameters so to enable a fair comparison. For each dataset, we train each resolution for the following number of iterations/epochs: MNIST [5K, 5K, 5K, 5K] iterations, CIFAR10 [50K, 50K, 50K, 450K] iterations, Navier-Stokes [5, 5, 5, 35] epochs, Shallow water [2, 2, 2, 14] epochs.