Dimensionality Reduction for Representing the Knowledge of Probabilistic Models

Authors: Marc T Law, Jake Snell, Amir-massoud Farahmand, Raquel Urtasun, Richard S Zemel

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally show that our framework improves generalization performance to unseen categories in zero-shot learning. We evaluate the relevance of our method in two types of experiments. The first learns low-dimensional representations for visualization to better interpret pre-trained deep models. The second experiment exploits the probability scores generated by a pre-trained classifier in the zero-shot learning context; these probability scores are used as supervision to improve performance on novel categories.
Researcher Affiliation Collaboration Marc T. Law & Jake Snell University of Toronto, Canada Vector Institute, Canada Amir-massoud Farahmand Vector Institute, Canada Raquel Urtasun University of Toronto, Canada Vector Institute, Canada Uber ATG, Canada Richard S. Zemel University of Toronto, Canada Vector Institute, Canada CIFAR Senior Fellow
Pseudocode Yes Algorithm 1 Dimensionality Reduction of Probabilistic Representations (DRPR) input : Set of training examples (e.g., images) in X and their target probability scores (e.g., classification scores w.r.t. k training categories), nonlinear mapping gθ parameterized by parameters θ, number of iterations t 1: for iteration 1 to t do 2: Randomly sample n training examples x1, , xn X and create target assignment matrix Y Yn k containing the target probability scores y1, , yn (i.e., Y = [y1, , yn] Yn k) 3: Create matrix F [f1, , fn] Vn such that i, fi = gθ(xi) 4: Create matrix of centers M diag(Y 1n) 1Y F and prior vector π 1 n Y 1n 5: Update the parameters θ by performing a gradient descent iteration of n (ψ(F, M, π), Y ) (i.e., Eq. (4)) 6: end for output : nonlinear mapping gθ
Open Source Code No The paper does not provide a direct link or explicit statement that the source code for the proposed DRPR method is publicly available. It only mentions pre-trained models used from another source.
Open Datasets Yes We evaluate our approach on the test sets of the MNIST (Le Cun et al., 1998), STL (Coates et al., 2011), CIFAR 10 and CIFAR 100 (Krizhevsky & Hinton, 2009) datasets with pre-trained models that are publicly available and optimized for cross entropy. We use the medium-scaled Caltech-UCSD Birds (CUB) dataset (Welinder et al., 2010) and Oxford Flowers-102 (Flowers) dataset (Nilsback & Zisserman, 2008).
Dataset Splits Yes CUB contains 11,788 bird images from 200 different species categories split into disjoint sets: 100 categories for training, 50 for validation and 50 for test. Flowers contains 8,189 flower images from 102 different species categories: 62 categories are used for training, 20 for validation and 20 for test.
Hardware Specification Yes We coded our method in Py Torch and ran all our experiments on a single Nvidia Ge Force GTX 1060 which has 6GB of RAM.
Software Dependencies No The paper states 'We coded our method in Py Torch' but does not specify the version number for PyTorch or any other software dependencies with their respective versions.
Experiment Setup Yes Mini-batch size: The training datasets of CUB and Flowers contain 5894 and 5878 images, respectively. In order to fit into memory, we set our mini-batch sizes as 421 (= 5894/14) and 735 ( 5878/8) for CUB and Flowers, respectively. Optimizer: We use the Adam optimizer with a learning rate of 10 5 to train both models ϕθ1 and gθ2. Initial temperature of our model: To make our optimization framework stable, we start with a temperature of 50. We then formulate our Bregman divergence as: d(fi, µc) = 1 temp fi µc 2 2 where fi and µc are the representations learned by our model. We decrease our temperature by 10% (i.e., tempt+1 = 0.9tempt) every 3000 epochs until the algorithm stops training.