Neural Bregman Divergences for Distance Learning

Authors: Fred Lu, Edward Raff, Francis Ferraro

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also demonstrate that our method more faithfully learns divergences over a set of both new and previously studied tasks, including asymmetric regression, ranking, and clustering. Our tests further extend to known asymmetric, but non-Bregman tasks, where our method still performs competitively despite misspecification, showing the general utility of our approach for asymmetric learning.
Researcher Affiliation Collaboration University of Maryland, Baltimore County Booz Allen Hamilton
Pseudocode Yes Algorithm 1 Neural Bregman Divergence (NBD).
Open Source Code No The paper mentions adapting PyTorch code from another work ("We adapt their Py Torch code from https: //github.com/spitis/deepnorms.") but does not provide a link or explicit statement for the code of their proposed method (NBD).
Open Datasets Yes The dataset consists of paired MNIST images... We also make a harder version by substituting MNIST with CIFAR10... We use the INRIA Holidays dataset (see Appendix G).
Dataset Splits Yes A 50K/10K train-test split was used. The training set consists of 10, 000 pairs sampled with random crops each epoch from the first 200 of the images, while the test set is a fixed set of 10, 000 pairs with crops drawn from the last 100.
Hardware Specification Yes We used Quadro RTX 6000 GPUs to train our models.
Software Dependencies No The paper mentions "Py Torch API (Paszke et al., 2017)" but does not specify a version number for PyTorch itself or any other software libraries used.
Experiment Setup Yes We used batch size 128, 200 epochs, 1e-3 learning rate for all models. A typical example of the parameters is batch size 256, 250 epochs, learning rate 1e-3. We used 100 epochs of training with learning rate 1e-3, batch size 1000. We use default hyperparameter settings to keep methods comparable, such as Adam optimizer, learning rate 1e-3, batch size 128, embedding dimension 128, and 200 epochs.