Monotonic Differentiable Sorting Networks

Authors: Felix Petersen, Christian Borgelt, Hilde Kuehne, Oliver Deussen

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the properties of the proposed function as well as their practical impact in the context of sorting supervision, we evaluate them with respect to two standard benchmark datasets. The MNIST sorting dataset [1] [4] consists of images of numbers from 0000 to 9999 composed of four MNIST digits [7]. Here, the task is training a network to produce a scalar output value for each image such that the ordering of the outputs follows the respective ordering of the images. Specifically, the metrics here are the proportion of full rankings correctly identified, and the proportion of individual element ranks correctly identified [1].
Researcher Affiliation Collaboration Felix Petersen1, Christian Borgelt2, Hilde Kuehne3,4, Oliver Deussen1 1University of Konstanz, 2University of Salzburg, 3University of Frankfurt, 4IBM-MIT Watson AI Lab, felix.petersen@uni-konstanz.de
Pseudocode No The paper does not contain pseudocode or algorithm blocks.
Open Source Code Yes The source code of this work is publicly available at github.com/Felix-Petersen/diffsort.
Open Datasets Yes The MNIST sorting dataset [1] [4] consists of images of numbers from 0000 to 9999 composed of four MNIST digits [7]. [...] The same task can also be extended to the more realistic SVHN [19] dataset with the difference that the images are already multi-digit numbers as shown in [4].
Dataset Splits No The paper mentions training steps and evaluation but does not explicitly provide training/validation/test dataset splits (e.g., percentages or sample counts) needed for reproduction.
Hardware Specification No The paper states that 'Each experiment can be reproduced on a single GPU' but does not specify any particular GPU model, CPU, or other hardware details.
Software Dependencies No The paper mentions using the Adam optimizer [20] but does not provide specific version numbers for any software dependencies like programming languages, frameworks, or libraries.
Experiment Setup Yes For training, we use the same network architecture as in previous works [1], [2], [4] and also use the Adam optimizer [20] at a learning rate of 3 10 4. For Figure 5, we train for 100 000 steps. For Table 3, we train for 200 000 steps on MNIST and 1 000 000 steps on SVHN. [...] For the inverse temperature β, we use the following values, which correspond to the optima in Figure 5 and were found via grid search: [table of β values]