Is a Modular Architecture Enough?
Authors: Sarthak Mittal, Yoshua Bengio, Guillaume Lajoie
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential. |
| Researcher Affiliation | Academia | Sarthak Mittal , Yoshua Bengio, Guillaume Lajoie Mila, Université de Montréal |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Open-sourced implementation is available at https://github.com/sarthmit/Mod_Arch |
| Open Datasets | No | The paper uses synthetic data ('Since we aim to study modular systems through synthetic data, here we flesh out the data-generating processes...'), which is generated on the fly ('infinite-data regime where each training iteration operates on a new data sample'), and thus no public dataset or its access information is provided. |
| Dataset Splits | No | The paper states it operates in an 'infinite-data regime where each training iteration operates on a new data sample', meaning there are no fixed train/validation/test splits of a finite dataset provided for reproduction. |
| Hardware Specification | Yes | All models are trained on single V100 GPUs, each taking a few hours. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper mentions the scope of experiments (number of rules, model capacities, training settings) and refers to appendices for 'training details', but does not include specific hyperparameter values or detailed training configurations in the main text. |