reproducibilityindex.ai

Linear-Time Learning on Distributions with Approximate Kernel Embeddings

Authors: Danica Sutherland, Junier Oliva, Barnabás Póczos, Jeff Schneider

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide an analysis of the approximation error in using our proposed random features, and show empirically the quality of our approximation both in estimating a Gram matrix and in solving learning tasks in real-world and synthetic data.
Researcher Affiliation	Academia	Dougal J. Sutherland and Junier B. Oliva and Barnab as P oczos and Jeff Schneider Carnegie Mellon University {dsutherl,joliva,bapoczos,schneide}@cs.cmu.edu
Pseudocode	Yes	The algorithm for computing features {z(A(pi))}N i=1 for a set of distributions {pi}N i=1, given sample sets {χi}N i=1 where χi = {X(i) j [0, 1]ℓ}ni j=1 iid pi, is thus: 1. Draw M scalars λj iid μ Z and D/2 vectors ωr iid N(0, σ 2I2M\|V \|), in O(M \|V \| D) time. 2. For each of the N input distributions i: (a) Compute a kernel density estimate from χi, ˆpi(uj) for each uj in (10), in O(nine) time. (b) Compute ˆA(ˆpi) using a numerical integration estimate as in (10), in O(M \|V \| ne) time. (c) Get the RKS features, z( ˆA(ˆpi)), in O(M \|V \| D) time.
Open Source Code	No	The paper mentions a GitHub link in footnote 3: 'github.com/dougalsutherland/skl-groups/', but explicitly states it's for the KL kernel ('as did the KL kernel3'), not the authors' main contribution (HDD embeddings). The paper also states: 'while the HDD embeddings used a simple Matlab implementation.', indicating their own code is not openly provided.
Open Datasets	Yes	We took the cat and dog classes from the CIFAR-10 dataset (Krizhevsky and Hinton 2009). We consider the Scene-15 dataset (Lazebnik, Schmid, and Ponce 2006).
Dataset Splits	Yes	Throughout these experiments we use M = 5, \|V \| = 10ℓ (selected as rules of thumb; larger values did not improve performance), and use a validation set (10% of the training set) to choose bandwidths for KDE and the RBF kernel as well as model regularization parameters.
Hardware Specification	No	No specific hardware details (like GPU or CPU models, or memory specifications) were provided for the experiments.
Software Dependencies	No	The paper mentions 'a simple Matlab implementation' for HDD embeddings and that SVM classifiers were used 'from LIBLINEAR (Fan et al. 2008, for the embeddings) or LIBSVM (Chang and Lin 2011, for the KL kernel)', but no specific version numbers for Matlab, LIBLINEAR, or LIBSVM are provided.
Experiment Setup	Yes	Throughout these experiments we use M = 5, \|V \| = 10ℓ (selected as rules of thumb; larger values did not improve performance), and use a validation set (10% of the training set) to choose bandwidths for KDE and the RBF kernel as well as model regularization parameters. Except in the scene classiﬁcation experiments, the histogram methods used 10 bins per dimension; performance with other values was not better. The KL estimator used the fourth nearest neighbor. ...we use D = 5 000. ...with D = 7 000.