Importance Sampling Tree for Large-scale Empirical Expectation
Authors: Olivier Canevet, Cijo Jose, Francois Fleuret
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experiments and results |
| Researcher Affiliation | Academia | Idiap Research Institute, Martigny, Switzerland Ecole Polytechnique F ed erale de Lausanne (EPFL), Lausanne, Switzerland |
| Pseudocode | No | The paper describes steps for its methods but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it format procedures like code. |
| Open Source Code | No | The paper mentions an external repository for a network design ('1https://github.com/nagadomi/kaggle-cifar10-torch7') but does not state that its own source code for the proposed Importance Sampling Tree (IST) methodology is concrete available. |
| Open Datasets | Yes | Our experiments replicate the training of a network1 designed for a Kaggle competition on the CIFAR10 dataset (Krizhevsky & Hinton, 2009) and We applied this IST method to the Gaussian kernel SVM trained on the Covertype data-set (Bache & Lichman). |
| Dataset Splits | Yes | For all variants, we also sample 1, 000 samples uniformly initially as a validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'Torch7' in a footnote related to a CNN implementation but does not provide specific version numbers for it or any other software dependencies crucial for replication. |
| Experiment Setup | Yes | We train a neural network with two units as input standing for the coordinate in the [0, 1]2 domain, two fully connected hidden layers with 40 units each, and one output unit. The transfer function is the hyperbolic tangent, and the weights are initialized layer after layer so that the response of every unit before non-linearity is centered, of standard deviation 0.5. We use the quadratic loss for training, and a pure stochastic gradient descent, one sample at a time. Every 1, 000 gradient steps, we compute a validation loss and adapt the step size. |