reproducibilityindex.ai

The Information Sieve

Authors: Greg Ver Steeg, Aram Galstyan

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a practical implementation of this framework for discrete variables and apply it to a variety of fundamental tasks in unsupervised learning including independent component analysis, lossy and lossless compression, and predicting missing values in data.
Researcher Affiliation	Academia	Greg Ver Steeg GREGV@ISI.EDU University of Southern California, Information Sciences Institute, Marina del Rey, CA 90292 USA Aram Galstyan GALSTYAN@ISI.EDU University of Southern California, Information Sciences Institute, Marina del Rey, CA 90292 USA
Pseudocode	No	The paper describes algorithmic steps in prose and references appendices for constructions, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code implementing this entire pipeline is available (Ver Steeg). <...in references...> Ver Steeg, Greg. Open source project implementing the discrete information sieve. http://github.com/ gregversteeg/discrete_sieve.
Open Datasets	Yes	For the following tasks, we consider 50k MNIST digits that were binarized at the normalized grayscale threshold of 0.5.
Dataset Splits	No	The paper states: 'We use 50k digits as training for models, and report compression results on the 10k test digits.' This describes a train/test split, but no explicit validation set or specific train/validation/test percentages are provided.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact CPU/GPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library names like TensorFlow or PyTorch, along with their versions).
Experiment Setup	Yes	The 28 × 28 binarized images are treated as binary vectors in a 784 dimensional space. The digit labels are also not used in our analysis. We trained the information sieve on this data, adding layers as long as the bounds were tightening. This led to a 12 layer representation and a lower bound on TC(X) of about 40 bits.