Towards a Neural Statistician
Authors: Harrison Edwards, Amos Storkey
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTAL RESULTS Given an input set x1, . . . xk we can use the statistic network to calculate an approximate posterior over contexts q(c|x1, . . . , xk; φ). Under the generative model, each context c specifies a conditional model p(x|c; θ). To get samples from the model corresponding to the most likely posterior value of c, we set c to the mean of the approximate posterior and then sample directly from the conditional distributions. This is described in Algorithm 2. We use this process in our experiments to show samples. In all experiments, we use the Adam optimization algorithm (Kingma & Ba, 2014) to optimize the parameters of the generative models and variational approximations. Batch normalization (Ioffe & Szegedy, 2015) is implemented for convolutional layers and we always use a batch size of 16. We primarily use the Theano (Theano Development Team, 2016) framework with the Lasagne (Dieleman et al., 2015) library, but the final experiments with face data were done using Tensorflow (Abadi et al., 2015). In all cases experiments were terminated after a given number of epochs when training appeared to have sufficiently converged (300 epochs for omniglot, youtube and spatial MNIST examples, and 50 epochs for the synthetic experiment). |
| Researcher Affiliation | Academia | Harrison Edwards School of Informatics University of Edinburgh Edinburgh, UK H.L.Edwards@sms.ed.ac.uk Amos Storkey School of Informatics University of Edinburgh Edinburgh, UK A.Storkey@ed.ac.uk |
| Pseudocode | Yes | A APPENDIX A: PSEUDOCODE Algorithm 1 Sampling a dataset of size k sample c p(c) for i = 1 to k do sample zi,L p(z L|c; θ) for j = L 1 to 1 do sample zi,j p(zj|zi,j+1, c; θ) end for sample xi p(x|zi,1, . . . , zi,L, c; θ) end for |
| Open Source Code | No | The paper mentions using open-source frameworks like Theano and TensorFlow, but does not explicitly state that the authors' own code for the described methodology is open-source or provide a link to it. |
| Open Datasets | Yes | We created a dataset called spatial MNIST. In spatial MNIST each image from MNIST (Le Cun et al., 1998) is turned into a dataset... Next we work with the OMNIGLOT data (Lake et al., 2015)... We use the Youtube Faces Database from Wolf et al. (2011). |
| Dataset Splits | Yes | We train on datasets drawn from 1200 classes and reserve the remaining classes to test few-shot sampling and classification. We created new classes by rotating and reflecting characters. We resized the images to 28 × 28. We sampled a binarization of each image for each epoch. We also randomly applied the dilation operator from computer vision as further data augmentation... We tried this with either 1 or 5 labelled examples per class and either 5 or 20 classes. For each trial we randomly select K classes, randomly select training examples for each class, and test on the remaining examples. This process is repeated 100 times and the results averaged. The validation and test sets contain 100 unique people each, and there is no overlap of persons between data splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. It only mentions using TensorFlow, Theano, and Lasagne. |
| Software Dependencies | No | The paper states: "We primarily use the Theano (Theano Development Team, 2016) framework with the Lasagne (Dieleman et al., 2015) library, but the final experiments with face data were done using Tensorflow (Abadi et al., 2015)." While it mentions software and their publication years, it does not provide specific version numbers (e.g., Theano 0.9, TensorFlow 1.0) for these frameworks or other libraries. |
| Experiment Setup | Yes | In all experiments, we use the Adam optimization algorithm (Kingma & Ba, 2014) to optimize the parameters of the generative models and variational approximations. Batch normalization (Ioffe & Szegedy, 2015) is implemented for convolutional layers and we always use a batch size of 16. In all cases experiments were terminated after a given number of epochs when training appeared to have sufficiently converged (300 epochs for omniglot, youtube and spatial MNIST examples, and 50 epochs for the synthetic experiment). The architecture for this experiment contains a single stochastic layer with 32 units for z and 3 units for c, . The model p(x|z, c; θ) and variational approximation q(z|x, c; φ) are each a diagonal Gaussian distribution with all mean and log variance parameters given by a network composed of three dense layers with Re LU activations and 128 units. The statistic network determining the mean and log variance parameters of posterior over context variables is composed of three dense layers before and after pooling, each with 128 units with Rectified Linear Unit (Re LU) activations. |