Usable Information and Evolution of Optimal Representations During Training

Authors: Michael Kleinman, Alessandro Achille, Daksh Idnani, Jonathan Kao

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show these effects on both perceptual decision-making tasks inspired by neuroscience literature, as well as on standard image classification tasks. We trained multiple network architectures on tasks and assessed the usable information in representations across layers and training epochs.
Researcher Affiliation Academia 1University of California, Los Angeles 2Caltech {michael.kleinman,dakshidnani}@ucla.edu aachille@caltech.edu kao@seas.ucla.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes We then use this framework to examine how relevant and irrelevant information are represented in more realistic tasks and architectures, and how hyper-parameters affect the learning dynamics. We define a coarse labelling of task labels and study how the network represents the fine and coarse labelling through training, using a Res Net-18 (He et al., 2016) and All-CNN (Springenberg et al., 2015) on CIFAR-10 and CIFAR-100.
Dataset Splits Yes FC Small, n = 2: batch size: 32, learning rate: 0.05, number of data samples: 10000 (90% train, 10% validation)
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions general software like ResNet-18 and All-CNN architectures, but does not provide specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes We trained a Res Net-18 (He et al., 2016) to output the coarse label of CIFAR-10, using an initial learning rate of 0.1 with exponential annealing (0.97), momentum (0.9), and a batch size of 128. For the All-CNN (Springenberg et al., 2015) we used a batch size of 128, initial learning rate of 0.05 decaying smoothly by a factor of 0.97 at each epoch, momentum of 0.9, and weight decay with coefficient 0.001.