Identifying and Controlling Important Neurons in Neural Machine Translation
Authors: Anthony Bau, Yonatan Belinkov, Hassan Sajjad, Nadir Durrani, Fahim Dalvi, James Glass
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show experimentally that translation quality depends on the discovered neurons, and find that many of them capture common linguistic phenomena. |
| Researcher Affiliation | Collaboration | 1MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, USA 2Qatar Computing Research Institute, HBKU Research Complex, Doha 5825, Qatar |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is publicly available as part of the Neuro X toolkit (Dalvi et al., 2019b).7https://github.com/fdalvi/Neuro X |
| Open Datasets | Yes | We use the United Nations (UN) parallel corpus (Ziemski et al., 2016) for all experiments. |
| Dataset Splits | No | The paper mentions training models on "different parts of the training set" and evaluating on "the official test set", but does not explicitly detail a separate validation set split. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Spacy for linguistic annotations and refers to a character convolutional neural network (char CNN) and LSTM encoder-decoder models, but it does not specify version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | We train 500 dimensional 2-layer LSTM encoder-decoder models with attention (Bahdanau et al., 2014). In order to study both word and sub-word properties, we use a word representation based on a character convolutional neural network (char CNN) as input to both encoder and decoder. |