What shapes feature representations? Exploring datasets, architectures, and training

Authors: Katherine Hermann, Andrew Lampinen

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study these questions using synthetic datasets in which the task-relevance of input features can be controlled directly. We find that when two features redundantly predict the labels, the model preferentially represents one, and its preference reflects what was most linearly decodable from the untrained model. Over training, task-relevant features are enhanced, and task-irrelevant features are partially suppressed.
Researcher Affiliation Academia Katherine L. Hermann Stanford University hermannk@stanford.edu Andrew K. Lampinen Stanford University andrewlampinen@gmail.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No The paper mentions using and modifying torchvision implementations and a training script from a GitHub repository, but it does not state that the authors' specific modifications or their full experimental code is publicly released or available.
Open Datasets Yes Navon dataset. As shown in Figure 1, this dual-feature dataset, modified from [13] based on [30],
Dataset Splits Yes For each of these tasks, we created 5 cross-validation splits, creating validation sets by holding out a subset of values for the non-target features (3 classes per feature). For the experiments in this section, we created train sets of 3430 items and validation sets of 3570 images, in both of which the features were uncorrelated. train sets (4900 images) and validation sets ( 4900 images, varied with correlation) from the larger set of 100,000 images.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using "torchvision implementation of the both models convolutional layers" and "Adam optimization", but it does not specify any version numbers for these software dependencies or others.
Experiment Setup Yes All models were trained for 90 epochs using Adam optimization [20] with a learning rate of 3 10 4, weight decay of 10 4, and batch size of 64, using a modified version of the torchvision training script [1]. We selected the model corresponding to the highest validation accuracy over the training period.