Phase Transitions for the Information Bottleneck in Representation Learning
Authors: Tailin Wu, Ian Fischer
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We quantitatively and qualitatively test the ability of our theory and Algorithm 1 to provide good predictions for IB phase transitions. We first verify them in fully categorical settings, where X, Y, Z are all discrete, and we show that the phase transitions can correspond to learning new classes as we increase β. We then test our algorithm on versions of the MNIST and CIFAR10 datasets with added label noise. |
| Researcher Affiliation | Collaboration | Tailin Wu Stanford tailin@cs.stanford.edu Ian Fischer Google Research iansf@google.com |
| Pseudocode | Yes | Algorithm 1 Phase transitions discovery for IB |
| Open Source Code | No | No explicit statement about releasing the source code for the methodology described in this paper, nor a link to a code repository, was found. |
| Open Datasets | Yes | CIFAR10 dataset (Krizhevsky & Hinton, 2009) |
| Dataset Splits | No | The paper mentions using MNIST training examples and CIFAR10 dataset but does not specify the train/validation/test splits, percentages, or absolute sample counts for reproducibility. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, or memory) used to run its experiments. |
| Software Dependencies | No | The paper mentions using Adam optimizer and a Wide ResNet implementation, but does not specify version numbers for these or any other software dependencies, which are necessary for reproducibility. |
| Experiment Setup | Yes | For MNIST: The encoder is a three-layer neural net, where each hidden layer has 512 neurons and leaky Re LU activation, and the last layer has linear activation. The classifier p(y|z) is a 2-layer neural net with a 128-neuron Re LU hidden layer. The backward encoder p(z|y) is also a 2-layer neural net with a 128-neuron Re LU hidden layer. We trained with Adam (Kingma & Welling, 2013) at learning rate of 10-3, and anneal down with factor 1/(1+0.01 epoch). For Alg. 1, for the fθ we use the same architecture as the encoder of CEB, and use |Z| = 50 in Alg. 1. For CIFAR10: We trained 28 1 Wide Res Net models... Samples from the encoder were passed to the classifier, a 2 layer MLP. ... β from 1.0 to 6.0 with step size of 0.02. ... annealing β from 100 down to the target β over 600 epochs, and continue to train at the target epoch for another 800 epochs. ... base learning rate of 10-3, and reduced the learning rate by a factor of 0.5 at 300, 400, and 500 epochs. ... |Z| = 50 in Alg. 1. |