Is a Modular Architecture Enough?

Authors: Sarthak Mittal, Yoshua Bengio, Guillaume Lajoie

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we provide a thorough assessment of common modular architectures, through the lens of simple and known modular data distributions. We highlight the benefits of modularity and sparsity and reveal insights on the challenges faced while optimizing modular systems. In doing so, we propose evaluation metrics that highlight the benefits of modularity, the regimes in which these benefits are substantial, as well as the sub-optimality of current end-to-end learned modular systems as opposed to their claimed potential.
Researcher Affiliation Academia Sarthak Mittal , Yoshua Bengio, Guillaume Lajoie Mila, Université de Montréal
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Open-sourced implementation is available at https://github.com/sarthmit/Mod_Arch
Open Datasets No The paper uses synthetic data ('Since we aim to study modular systems through synthetic data, here we flesh out the data-generating processes...'), which is generated on the fly ('infinite-data regime where each training iteration operates on a new data sample'), and thus no public dataset or its access information is provided.
Dataset Splits No The paper states it operates in an 'infinite-data regime where each training iteration operates on a new data sample', meaning there are no fixed train/validation/test splits of a finite dataset provided for reproduction.
Hardware Specification Yes All models are trained on single V100 GPUs, each taking a few hours.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup No The paper mentions the scope of experiments (number of rules, model capacities, training settings) and refers to appendices for 'training details', but does not include specific hyperparameter values or detailed training configurations in the main text.