Learning Deep Disentangled Embeddings With the F-Statistic Loss
Authors: Karl Ridgeway, Michael C. Mozer
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose and evaluate a novel loss function based on the F statistic, which describes the separation of two or more distributions. By ensuring that distinct classes are well separated on a subset of embedding dimensions, we obtain embeddings that are useful for few-shot learning. Our embedding method matches or beats state-of-the-art, as evaluated by performance on recall@k and few-shot learning tasks. Our method also obtains performance superior to a variety of alternatives on disentangling, as evaluated by two key properties of a disentangled representation: modularity and explicitness. |
| Researcher Affiliation | Collaboration | Karl Ridgeway Department of Computer Science University of Colorado and Sensory, Inc. Boulder, Colorado karl.ridgeway@colorado.edu Michael C. Mozer Department of Computer Science University of Colorado Boulder, Colorado mozer@colorado.edu |
| Pseudocode | No | The paper describes the algorithm's behavior and provides a diagram (Figure 1), but it does not include a formal pseudocode block or an algorithm section. |
| Open Source Code | Yes | Code for all models is available at https://github.com/kridgeway/f-statistic-loss-nips-2018 |
| Open Datasets | Yes | For this task, we evaluate using two datasets CUHK03 (Li et al., 2014) and Market-1501 (Zheng et al., 2015) following the methodology of Ustinova & Lempitsky (2016). The second task involves matching a bird from a wide angle photograph; we evaluate performance on the CUB-200-2011 birds dataset (Wah et al., 2011). We explore two datasets in which each instance is tagged with values for several statistically independent factors. Some of the factors are treated as class-related, and some as noise. First, we train on a data set of video game sprites (Reed et al., 2015). We also explore the small NORB dataset (Le Cun et al., 2004). |
| Dataset Splits | Yes | Five-fold cross validation is performed in every case. The first split is used to tune model hyper-parameters, and we report accuracy on the final four splits. For each split, a validation set was withheld from the training set, and used for early stopping. For both datasets, we evaluated with five-fold cross validation, using the conjunction of factors to split: the 7 factors for sprites and 3 (toy type, azimuth, and elevation) for norb. For each split, the validation set was used to determine when to stop training, based on mean factor explicitness. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software components like 'inception v3 network' and 'ADAM optimizer' but does not specify their version numbers or other specific software dependencies with versions. |
| Experiment Setup | Yes | For CUHK03 and Market-1501, we use the Deep Metric Learning (Yi et al., 2014b) architecture. For CUB-200-2011, we use an inception v3 (Szegedy et al., 2016) network pretrained on Image Net, and extract the 2048-dimensional features from the final pooling layer. We treat these features as constants, and optimize a fully connected net, with 1024 hidden Re LU units. For every dataset, we use a 500-dimensional embedding. All nets were trained using the ADAM (Kingma & Ba, 2014) optimizer, with a learning rate of 10 4 for all losses, except the F-statistic loss, which we found benefitted from a slightly higher learning rate (2 10 4). To construct a mini-batch for training, we randomly select 12 identities, with up to 10 samples of each identity. For the F-statistic loss, we determined the best value of d, the number of dimensions to separate, using the validation set of the first split. For CUHK03 we chose d = 70, for Market-1501 d = 63, and for CUB-200 d = 3. For the triplet loss we found that a margin of 0.1 worked well for all datasets. For the sprites dataset, we used the encoder architecture of Reed et al. (2015) as well as their embedding dimensionality of 22. For small NORB, we use a convolutional network with 3 convolutional layers and a final fully connected layer with an embedding dimensionality of 20. For the convolutional layers, the filter sizes are (7 7, 3 3, 3 3), the filter counts are (48, 64, 72), and all use a stride of 2 and Re LU activation. For the F-statistic loss, we set the number of training dimensions d = 2. Each minibatch is composed of up to 12 factor-values. We train with up to 10 instances per factor-value for triplet and histogram. For the F-statistic loss, we found that training with up to 5 instances per factor-value helps avoid underfitting. |