Importance Sampling Tree for Large-scale Empirical Expectation

Authors: Olivier Canevet, Cijo Jose, Francois Fleuret

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments and results
Researcher Affiliation Academia Idiap Research Institute, Martigny, Switzerland Ecole Polytechnique F ed erale de Lausanne (EPFL), Lausanne, Switzerland
Pseudocode No The paper describes steps for its methods but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it format procedures like code.
Open Source Code No The paper mentions an external repository for a network design ('1https://github.com/nagadomi/kaggle-cifar10-torch7') but does not state that its own source code for the proposed Importance Sampling Tree (IST) methodology is concrete available.
Open Datasets Yes Our experiments replicate the training of a network1 designed for a Kaggle competition on the CIFAR10 dataset (Krizhevsky & Hinton, 2009) and We applied this IST method to the Gaussian kernel SVM trained on the Covertype data-set (Bache & Lichman).
Dataset Splits Yes For all variants, we also sample 1, 000 samples uniformly initially as a validation set.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions 'Torch7' in a footnote related to a CNN implementation but does not provide specific version numbers for it or any other software dependencies crucial for replication.
Experiment Setup Yes We train a neural network with two units as input standing for the coordinate in the [0, 1]2 domain, two fully connected hidden layers with 40 units each, and one output unit. The transfer function is the hyperbolic tangent, and the weights are initialized layer after layer so that the response of every unit before non-linearity is centered, of standard deviation 0.5. We use the quadratic loss for training, and a pure stochastic gradient descent, one sample at a time. Every 1, 000 gradient steps, we compute a validation loss and adapt the step size.