Deep Divergence Learning

Authors: Hatice Kubra Cilingir, Rachel Manzelli, Brian Kulis

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In all three of the above settings, we show empirical results that highlight the benefits of our framework. In particular, we show that learning asymmetric divergences offers perfor mance gains over existing symmetric models on benchmark data, and achieve state-of-the-art classification performance in some settings.
Researcher Affiliation Academia Kubra Cilingir 1 Rachel Manzelli 1 Brian Kulis 1 1Department of Electrical and Computer Engineering, Boston University, Boston, Massachusetts, USA.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/kubrac/Deep_Bregman.
Open Datasets Yes We generated n = 500 training points... We use WHARF (Bruno et al., 2014), MHEALTH (Banos et al., 2014; 2015), and WISDM (Weiss et al., 2019) datasets in our initial experiments. ... on the four benchmark datasets used in the original triplet loss paper (Hoffer & Ailon, 2015) MNIST, Cifar10, SVHN, and STL10 as well as Fashion MNIST. ... We apply our approach on 28x28 MNIST and CELEBA datasets, as is standard for GAN applications.
Dataset Splits Yes We treat several architecture choices as hyperparameters and validate over these hyperparame ters using Bayesian optimization (tuned separately for each dataset); Table 5 lists the hyperparameters that we search over, along with the ranges of values considered.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers.
Experiment Setup Yes The number of units in each layer were set to 1000, 500, and 2, and standard Re LU activation was used. ... We treat several architecture choices as hyperparameters and validate over these hyperparame ters using Bayesian optimization... Model hyperparams: layers 2-5, conv filters 16-128, conv kernels 3-9... Training hyperparams: margin 0.1-2.0, epochs 10-40, learning rate 10^-5-10^-1, batch size 32-128, optimizer adam / sgd / rms...