reproducibilityindex.ai

DataMUX: Data Multiplexing for Neural Networks

Authors: Vishvak Murahari, Carlos Jimenez, Runzhe Yang, Karthik Narasimhan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we introduce data multiplexing (Data MUX), a technique that enables deep neural networks to process multiple inputs simultaneously using a single compact representation. Data MUX demonstrates that neural networks are capable of generating accurate predictions over mixtures of inputs, resulting in increased inference throughput with minimal extra memory requirements... We show the viability of Data MUX for different architectures (Transformers, and to a much lesser extent MLPs and CNNs) across six different tasks spanning sentence classiﬁcation, named entity recognition and image classiﬁcation.
Researcher Affiliation	Academia	Vishvak Murahari Department of Computer Science Princeton University murahari@princeton.edu Carlos E. Jimenez Department of Computer Science Princeton University carlosej@princeton.edu Runzhe Yang Department of Computer Science Princeton University runzhey@princeton.edu Karthik Narasimhan Department of Computer Science Princeton University karthikn@princeton.edu
Pseudocode	No	The paper describes the methods using text and mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/princeton-nlp/Data MUX
Open Datasets	Yes	We evaluate our models and the baselines on two types of text classiﬁcation tasks: 1. Token-level classiﬁcation: ... Co NLL-2003 Named Entity Recognition (NER) task Sang and Meulder (2003). 2. Sentence-level classiﬁcation: ... GLUE benchmark Wang et al. (2019): ... SST-2 Socher et al. (2013), ... QQP 2, and the natural language inference tasks MNLI Williams et al. (2018) and QNLI Wang et al. (2019); Rajpurkar et al. (2016). The T-MUX models are all pre-trained using the retrieval warm-up on the Wikitext-103 dataset Merity et al. (2017).
Dataset Splits	Yes	For all tasks, we use the standard train/validation/test splits provided with each dataset.
Hardware Specification	Yes	We conducted all experiments on NVIDIA V100 GPUs (32GB) on a shared cluster.
Software Dependencies	No	The paper mentions using the Huggingface Wolf et al. (2019) framework but does not specify exact version numbers for any software dependencies.
Experiment Setup	Yes	In addition, we also continue to use the retrieval task as an auxiliary objective during task training. The total loss is a combination of the task loss and retrieval loss (we use = 0.1 in our experiments): L = (1 )LTask + LRetrieval, (4). For the Transformer models, we train for 10 epochs using a learning rate of 1e-4 with a linear decay schedule and a warm-up of 10% of the training steps using the AdamW optimizer with a batch size of 32. For the MLP and CNN models, we used the Adam optimizer with a learning rate of 1e-3 and a batch size of 128.