Understanding Unimodal Bias in Multimodal Deep Linear Networks
Authors: Yedi Zhang, Peter E. Latham, Andrew M Saxe
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our findings with numerical simulations of multimodal deep linear networks and certain nonlinear networks. |
| Researcher Affiliation | Academia | 1Gatsby Computational Neuroscience Unit, University College London 2Sainsbury Wellcome Centre, University College London. |
| Pseudocode | No | The paper describes mathematical derivations and processes in text and equations, but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide our code at https://github.com/yedizhang/unimodal-bias. |
| Open Datasets | Yes | We validate our results in multimodal deep Re LU networks trained on a noisy MNIST (Lecun et al., 1998) task. |
| Dataset Splits | No | The paper mentions the use of training samples and a test set for MNIST, but it does not explicitly specify the training/validation/test splits (e.g., percentages or exact counts) for any of its experiments. |
| Hardware Specification | No | The paper describes the software (Pytorch) and training configurations used, but it does not specify any particular hardware (e.g., GPU models, CPU types) used for the experiments. |
| Software Dependencies | No | Pytorch s default initialization is used. |
| Experiment Setup | Yes | The deep fully-connected Re LU networks and deep convolutional networks are trained with SGD with cross-entropy loss on the noisy MNIST dataset. The batch size is 1000. The learning rate at the beginning of training is 0.04 for the fully-connected Re LU networks and 0.002 for the convolutional networks. We use a learning rate scheduler that decays the learning rate by 0.996 every epoch. |