Differentiable Top-k with Optimal Transport

Authors: Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance. ... We evaluate the performance of the proposed neural network-based k NN classifier on two benchmark datasets: MNIST dataset of handwritten digits (Le Cun et al., 1998) and the CIFAR-10 dataset of natural images (Krizhevsky et al., 2009)... We report the classification accuracies on the standard test sets in Table 1.
Researcher Affiliation Collaboration Yujia Xie College of Computing Georgia Tech Xie.Yujia000@gmail.com Hanjun Dai Google Brain hadai@google.com Minshuo Chen College of Engineering Georgia Tech mchen393@gatech.edu Bo Dai Google Brain bodai@google.com Tuo Zhao College of Engineering Georgia Tech tourzhao@gatech.edu Hongyuan Zha School of Data Science Shenzhen Research Institute of Big Data, CUHK, Shenzhen zhahy@cuhk.edu.cn Wei Wei Google Cloud AI wewei@google.com Tomas Pfister Google Cloud AI tpfister@google.com
Pseudocode Yes Algorithm 1 SOFT Top-k ... Algorithm 2 Beam search training with SOFT Top-k
Open Source Code Yes We also include a Pytorch Paszke et al. (2017) implementation of the forward and backward pass in Appendix B by extending the autograd automatic differentiation package.
Open Datasets Yes We evaluate the performance of the proposed neural network-based k NN classifier on two benchmark datasets: MNIST dataset of handwritten digits (Le Cun et al., 1998) and the CIFAR-10 dataset of natural images (Krizhevsky et al., 2009)... We evaluate our proposed beam search + sorted SOFT top-k training procedure using WMT2014 English French dataset.
Dataset Splits No The paper mentions 'canonical splits for training and testing without data augmentation' for MNIST and CIFAR-10, and uses WMT2014. While these are standard datasets with known splits, the paper does not explicitly provide the percentages or counts for training, validation, and test splits within the main text, nor does it specify how a validation set was used or created.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions 'Pytorch Paszke et al. (2017)' as the framework used for implementation, but it does not specify exact version numbers for PyTorch or any other software dependencies needed to replicate the experiments.
Experiment Setup Yes We adopt the coefficient of entropy regularizer = 10 3 for MNIST dataset and = 10 5 for CIFAR-10 dataset. Further implementation details can be found in Appendix C. ... We adopt beam size 5, teacher forcing ratio = 0.8, and = 10 1. For detailed settings of the training procedure, please refer to Appendix C.