Understanding Visual Feature Reliance through the Lens of Complexity
Authors: Thomas Fel, Louis Béthune, Andrew Lampinen, Thomas Serre, Katherine Hermann
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using this V-information metric, we analyze the complexities of 10,000 features represented as directions in the penultimate layer that were extracted from a standard Image Net-trained vision model. Our study addresses four key questions: First, we ask what features look like as a function of complexity and find a spectrum of simple-to-complex features present within the model. Second, we ask when features are learned during training. We find that simpler features dominate early in training, and more complex features emerge gradually. Third, we investigate where within the network simple and complex features flow , and find that simpler features tend to bypass the visual hierarchy via residual connections. Fourth, we explore the connection between features complexity and their importance in driving the network s decision. |
| Researcher Affiliation | Collaboration | Thomas Fel Google Deep Mind Brown University Louis Béthune Université de Toulouse Andrew Kyle Lampinen Google Deep Mind Thomas Serre Brown University Katherine Hermann Google Deep Mind |
| Pseudocode | Yes | Algorithm 1 : Levin Universal Search |
| Open Source Code | No | Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: the dataset Image Net can be accessed through https://www.image-net. org. However the code is not available for review at time of submission. |
| Open Datasets | Yes | Justification: we rely exclusively on the Image Net dataset, more specifically the subset of the Image Net Large Scale Visual Recognition Challenge (ILSVRC) whose license can be accessed at this URL: https://www.kaggle.com/competitions/ imagenet-object-localization-challenge/rules#7-competition-data. |
| Dataset Splits | Yes | We train the model for 90 epochs with an initial learning rate of 0.7, adjusted down by a factor of 10 at epochs 30, 60, and 80, achieving a 78.9% accuracy on the Image Net validation set, which is on par with reported accuracy in similar studies [45, 114]. |
| Hardware Specification | No | Does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] Justification: the cost of re-training the model can be estimated in the relevant literature. Computing the complexity scores involves solving a (high dimensional) linear regression at every depth and every epoch of interest for each of the features. |
| Software Dependencies | No | We utilized the block coordinate descent solver from Scikit-learn [86] to solve the NMF problem. Specifically, the model used here was the Res Net50 implementation from the Keras library [26]. |
| Experiment Setup | Yes | Model Setup. We study feature complexity within an Image Net-trained Res Net50 [45]. We train the model for 90 epochs with an initial learning rate of 0.7, adjusted down by a factor of 10 at epochs 30, 60, and 80, achieving a 78.9% accuracy on the Image Net validation set, which is on par with reported accuracy in similar studies [45, 114]. |