Stochastic Optimization of Sorting Networks via Continuous Relaxations

Authors: Aditya Grover, Eric Wang, Aaron Zweig, Stefano Ermon

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On all tasks, we observe significant empirical improvements due to Neural Sort over the relevant baselines and competing relaxations to permutation matrices. (Section 1) Table 1: Average sorting accuracy on the test set. (Section 6.1) Table 2: Test mean squared error ( 10 4) and R2 values (in parenthesis) for quantile regression. (Section 6.2) Table 3: Average test k NN classification accuracies from n neighbors for best value of k. (Section 6.3)
Researcher Affiliation Academia Aditya Grover , Eric Wang , Aaron Zweig & Stefano Ermon Computer Science Department Stanford University {adityag,ejwang,azweig,ermon}@cs.stanford.edu
Pseudocode Yes A.1 TENSORFLOW def deterministic_Neural Sort(s, tau): def stochastic_Neural Sort(s, n_samples, tau): A.2 PYTORCH def deterministic_Neural Sort(s, tau): def stochastic_Neural Sort(s, n_samples, tau):
Open Source Code Yes The full codebase for this work is open-sourced at https://github.com/ermongroup/neuralsort. (Section 7)
Open Datasets Yes We first create the large-MNIST dataset, which extends the MNIST dataset of handwritten digits. (Section 6.1) We consider three benchmark datasetes: MNIST dataset of handwritten digits, Fashion MNIST dataset of fashion apparel, and the CIFAR-10 dataset of natural images (no data augmentation) with the canonical splits for training and testing. (Section 6.3)
Dataset Splits Yes For the sorting and quantile regression experiments, we used standard training/validation/test splits of 50, 000/10, 000/10, 000 images of MNIST for constructing the large-MNIST dataset. (Appendix D) For CIFAR-10, we used a split of 45, 000/5000/10, 000 examples for training/validation/test. (Appendix D)
Hardware Specification No The paper mentions that the method "can be implemented efficiently on GPU hardware" (Section 3), but does not specify any particular GPU model, CPU, or other hardware component used for the experiments.
Software Dependencies No We used Tensorflow (Abadi et al., 2016) and Py Torch (Paszke et al., 2017) for our experiments. (Appendix D) The paper mentions these libraries but does not provide specific version numbers, which is required for a reproducible description.
Experiment Setup Yes For this experiment, we used an Adam optimizer with an initial learning rate of 10 4 and a batch size of 20. Continuous relaxations to sorting also introduce another hyperparameter: the temperature τ for the Sinkhorn-based and Neural Sort-based approaches. We tuned this hyperparameter on the set {1, 2, 4, 8, 16} by picking the model with the best validation accuracy on predicting entire permutations (as opposed to predicting individual maps between elements and ranks). (Appendix D.1 Hyperparameters)