Explainable and Discourse Topic-aware Neural Language Understanding
Authors: Yatin Chaudhary, Hinrich Schuetze, Pankaj Gupta
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments over a range of tasks such as language modeling, word sense disambiguation, document classification, retrieval and text generation demonstrate ability of the proposed model in improving language understanding. |
| Researcher Affiliation | Collaboration | 1Corporate Technology, Machine Intelligence (MIC-DE), Siemens AG, Munich, Germany 2CIS, University of Munich (LMU), Munich, Germany. |
| Pseudocode | Yes | Algorithm 1 Computation of combined loss L; Algorithm 2 Utility functions |
| Open Source Code | Yes | Implementation of NCLM is available at: https://github.com/ Yatin Chaudhary/NCLM. |
| Open Datasets | Yes | We present experimental results of language modeling using our proposed models on APNEWS, IMDB and BNC datasets (Lau et al., 2017). We use three labeled datasets: 20Newsgroups (20NS), Reuters (R21578) and IMDB movie reviews (IMDB) (See supplementary for data statistics). |
| Dataset Splits | No | For data statistics and time complexity of experiments refer supplementary. Experimental setup: We follow Wang et al. (2018) for our experimental setup. See supplementary for detailed hyperparameter settings. |
| Hardware Specification | No | No explicit hardware specifications (e.g., specific GPU/CPU models, memory details) used for running experiments were provided in the main text. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., libraries, frameworks, programming language versions) were provided in the main text. |
| Experiment Setup | Yes | We fix the NLM sequence length to 30 and bigger sentences are split into multiple sequences of length less than 30. We initialize the input word embeddings for NLM with 300-dimensional pretrained embeddings extracted from word2vec (Mikolov et al., 2013) model trained on Google News. Models are trained using a learning rate of 1e-3 & batch size of 32. |