Deep Divergence Learning
Authors: Hatice Kubra Cilingir, Rachel Manzelli, Brian Kulis
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In all three of the above settings, we show empirical results that highlight the benefits of our framework. In particular, we show that learning asymmetric divergences offers perfor mance gains over existing symmetric models on benchmark data, and achieve state-of-the-art classification performance in some settings. |
| Researcher Affiliation | Academia | Kubra Cilingir 1 Rachel Manzelli 1 Brian Kulis 1 1Department of Electrical and Computer Engineering, Boston University, Boston, Massachusetts, USA. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/kubrac/Deep_Bregman. |
| Open Datasets | Yes | We generated n = 500 training points... We use WHARF (Bruno et al., 2014), MHEALTH (Banos et al., 2014; 2015), and WISDM (Weiss et al., 2019) datasets in our initial experiments. ... on the four benchmark datasets used in the original triplet loss paper (Hoffer & Ailon, 2015) MNIST, Cifar10, SVHN, and STL10 as well as Fashion MNIST. ... We apply our approach on 28x28 MNIST and CELEBA datasets, as is standard for GAN applications. |
| Dataset Splits | Yes | We treat several architecture choices as hyperparameters and validate over these hyperparame ters using Bayesian optimization (tuned separately for each dataset); Table 5 lists the hyperparameters that we search over, along with the ranges of values considered. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers. |
| Experiment Setup | Yes | The number of units in each layer were set to 1000, 500, and 2, and standard Re LU activation was used. ... We treat several architecture choices as hyperparameters and validate over these hyperparame ters using Bayesian optimization... Model hyperparams: layers 2-5, conv filters 16-128, conv kernels 3-9... Training hyperparams: margin 0.1-2.0, epochs 10-40, learning rate 10^-5-10^-1, batch size 32-128, optimizer adam / sgd / rms... |