Sparsity-Constrained Optimal Transport

Authors: Tianlin Liu, Joan Puigcerver, Mathieu Blondel

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our framework in 6 and in Appendix A through a variety of experiments. ... We applied sparsity-constrained OT to vision sparse mixtures of experts (V-Mo E) models for large-scale image recognition (Riquelme et al., 2021). ... Table 2 summarizes the validation accuracy on JFT300M and 10-shot accuracy on Image Net.
Researcher Affiliation Collaboration Tianlin Liu University of Basel Joan Puigcerver Google Research, Brain team Mathieu Blondel Google Research, Brain team
Pseudocode No The paper references Algorithm 1 from Riquelme et al. (2021) in Appendix A.5, but it does not contain a pseudocode or algorithm block for its own proposed method.
Open Source Code No The paper does not include any statement about releasing source code or provide a link to a code repository for its methodology.
Open Datasets Yes We train on the JFT-300M dataset (Sun et al., 2017), which is a large scale dataset that contains more than 305 million images. We then perform 10-shot transfer learning on the Image Net dataset (Deng et al., 2009).
Dataset Splits Yes JFT-300M has around 305M training and 50,000 validation images. ... For downstream evaluations, we perform 10-shot linear transfer on Image Net (Deng et al., 2009). ... This newly initialized layer is trained on 10 examples per Image Net class (10-shot learning).
Hardware Specification Yes Table 4: Total Training TPUv2-core-days
Software Dependencies No The paper mentions software like LBFGS, ADAM, PyTorch (implicitly, as it's a deep learning paper), but does not specify their version numbers (e.g., "ADAM optimizer with a learning rate of 10^-2").
Experiment Setup Yes We do so by using an ADAM optimizer with a learning rate of 10^-2 for 50 steps. ... The buffer capacity is set to be n/κ = 32/2 = 16, that is, each expert can take 16 tokens at most. To match this setting, we use k = 16 in (18) for our sparsity-constrained router. ... Algorithms employing an OT-based approach perform 500 iterations to find T, using either the Sinkhorn algorithm (with the Negentropy method) or LBFGS (used by the rest of OT-based methods). We use a sparsity-constraint of k = 1.15 m/n