Minimal Achievable Sufficient Statistic Learning
Authors: Milan Cvitkovic, Günther Koliander
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a series of experiments, we show that deep networks trained with MASS Learning achieve competitive performance on supervised learning, regularization, and uncertainty quantification benchmarks. |
| Researcher Affiliation | Academia | 1Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California, USA 2Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria. |
| Pseudocode | No | The paper does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce all experiments is available online.2 https://github.com/mwcvitkovic/ MASS-Learning |
| Open Datasets | Yes | We performed all experiments on the CIFAR-10 dataset (Krizhevsky, 2009) |
| Dataset Splits | No | The paper mentions 'TRAINING SET SIZE' in tables and 'Test-set classification accuracy', but does not explicitly state a validation set size, percentage, or a specific train/validation/test split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments. |
| Software Dependencies | No | The paper states 'coded all our models in Py Torch (Paszke et al., 2017)' but does not specify a version number for PyTorch or any other software dependency. |
| Experiment Setup | Yes | We use two networks in our experiments. Small MLP is a feedforward network with two fully-connected layers of 400 and 200 hidden units, respectively, both with elu nonlinearities (Clevert et al., 2015). Res Net20 is the 20-layer residual net of He et al. (2015). In all our experiments, the variational distribution qφ(x|y) for each possible output class y is a mixture of multivatiate Gaussian distributions for which we learn the mixture weights, means, and covariance matrices. ... we use a subsampling strategy: we estimate the Jf term using only a 1/|Y | fraction of the datapoints in a minibatch. ... We performed all experiments on the CIFAR-10 dataset (Krizhevsky, 2009), and coded all our models in Py Torch (Paszke et al., 2017). Full details on all experiments is in Supplementary Material 7.7. |