Differentiable Top-k with Optimal Transport
Authors: Yujia Xie, Hanjun Dai, Minshuo Chen, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei, Tomas Pfister
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance. ... We evaluate the performance of the proposed neural network-based k NN classifier on two benchmark datasets: MNIST dataset of handwritten digits (Le Cun et al., 1998) and the CIFAR-10 dataset of natural images (Krizhevsky et al., 2009)... We report the classification accuracies on the standard test sets in Table 1. |
| Researcher Affiliation | Collaboration | Yujia Xie College of Computing Georgia Tech Xie.Yujia000@gmail.com Hanjun Dai Google Brain hadai@google.com Minshuo Chen College of Engineering Georgia Tech mchen393@gatech.edu Bo Dai Google Brain bodai@google.com Tuo Zhao College of Engineering Georgia Tech tourzhao@gatech.edu Hongyuan Zha School of Data Science Shenzhen Research Institute of Big Data, CUHK, Shenzhen zhahy@cuhk.edu.cn Wei Wei Google Cloud AI wewei@google.com Tomas Pfister Google Cloud AI tpfister@google.com |
| Pseudocode | Yes | Algorithm 1 SOFT Top-k ... Algorithm 2 Beam search training with SOFT Top-k |
| Open Source Code | Yes | We also include a Pytorch Paszke et al. (2017) implementation of the forward and backward pass in Appendix B by extending the autograd automatic differentiation package. |
| Open Datasets | Yes | We evaluate the performance of the proposed neural network-based k NN classifier on two benchmark datasets: MNIST dataset of handwritten digits (Le Cun et al., 1998) and the CIFAR-10 dataset of natural images (Krizhevsky et al., 2009)... We evaluate our proposed beam search + sorted SOFT top-k training procedure using WMT2014 English French dataset. |
| Dataset Splits | No | The paper mentions 'canonical splits for training and testing without data augmentation' for MNIST and CIFAR-10, and uses WMT2014. While these are standard datasets with known splits, the paper does not explicitly provide the percentages or counts for training, validation, and test splits within the main text, nor does it specify how a validation set was used or created. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Pytorch Paszke et al. (2017)' as the framework used for implementation, but it does not specify exact version numbers for PyTorch or any other software dependencies needed to replicate the experiments. |
| Experiment Setup | Yes | We adopt the coefficient of entropy regularizer = 10 3 for MNIST dataset and = 10 5 for CIFAR-10 dataset. Further implementation details can be found in Appendix C. ... We adopt beam size 5, teacher forcing ratio = 0.8, and = 10 1. For detailed settings of the training procedure, please refer to Appendix C. |