Usable Information and Evolution of Optimal Representations During Training
Authors: Michael Kleinman, Alessandro Achille, Daksh Idnani, Jonathan Kao
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show these effects on both perceptual decision-making tasks inspired by neuroscience literature, as well as on standard image classification tasks. We trained multiple network architectures on tasks and assessed the usable information in representations across layers and training epochs. |
| Researcher Affiliation | Academia | 1University of California, Los Angeles 2Caltech {michael.kleinman,dakshidnani}@ucla.edu aachille@caltech.edu kao@seas.ucla.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | We then use this framework to examine how relevant and irrelevant information are represented in more realistic tasks and architectures, and how hyper-parameters affect the learning dynamics. We define a coarse labelling of task labels and study how the network represents the fine and coarse labelling through training, using a Res Net-18 (He et al., 2016) and All-CNN (Springenberg et al., 2015) on CIFAR-10 and CIFAR-100. |
| Dataset Splits | Yes | FC Small, n = 2: batch size: 32, learning rate: 0.05, number of data samples: 10000 (90% train, 10% validation) |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions general software like ResNet-18 and All-CNN architectures, but does not provide specific version numbers for software dependencies such as Python, PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | We trained a Res Net-18 (He et al., 2016) to output the coarse label of CIFAR-10, using an initial learning rate of 0.1 with exponential annealing (0.97), momentum (0.9), and a batch size of 128. For the All-CNN (Springenberg et al., 2015) we used a batch size of 128, initial learning rate of 0.05 decaying smoothly by a factor of 0.97 at each epoch, momentum of 0.9, and weight decay with coefficient 0.001. |