reproducibilityindex.ai

Sparsity-Constrained Optimal Transport

Authors: Tianlin Liu, Joan Puigcerver, Mathieu Blondel

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our framework in 6 and in Appendix A through a variety of experiments. ... We applied sparsity-constrained OT to vision sparse mixtures of experts (V-Mo E) models for large-scale image recognition (Riquelme et al., 2021). ... Table 2 summarizes the validation accuracy on JFT300M and 10-shot accuracy on Image Net.
Researcher Affiliation	Collaboration	Tianlin Liu University of Basel Joan Puigcerver Google Research, Brain team Mathieu Blondel Google Research, Brain team
Pseudocode	No	The paper references Algorithm 1 from Riquelme et al. (2021) in Appendix A.5, but it does not contain a pseudocode or algorithm block for its own proposed method.
Open Source Code	No	The paper does not include any statement about releasing source code or provide a link to a code repository for its methodology.
Open Datasets	Yes	We train on the JFT-300M dataset (Sun et al., 2017), which is a large scale dataset that contains more than 305 million images. We then perform 10-shot transfer learning on the Image Net dataset (Deng et al., 2009).
Dataset Splits	Yes	JFT-300M has around 305M training and 50,000 validation images. ... For downstream evaluations, we perform 10-shot linear transfer on Image Net (Deng et al., 2009). ... This newly initialized layer is trained on 10 examples per Image Net class (10-shot learning).
Hardware Specification	Yes	Table 4: Total Training TPUv2-core-days
Software Dependencies	No	The paper mentions software like LBFGS, ADAM, PyTorch (implicitly, as it's a deep learning paper), but does not specify their version numbers (e.g., "ADAM optimizer with a learning rate of 10^-2").
Experiment Setup	Yes	We do so by using an ADAM optimizer with a learning rate of 10^-2 for 50 steps. ... The buffer capacity is set to be n/κ = 32/2 = 16, that is, each expert can take 16 tokens at most. To match this setting, we use k = 16 in (18) for our sparsity-constrained router. ... Algorithms employing an OT-based approach perform 500 iterations to find T, using either the Sinkhorn algorithm (with the Negentropy method) or LBFGS (used by the rest of OT-based methods). We use a sparsity-constraint of k = 1.15 m/n