Mask & Focus: Conversation Modelling by Learning Concepts
Authors: Gaurav Pandey, Dinesh Raghu, Sachindra Joshi8584-8591
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate the utility of Mask & Focus, we evaluate it on two datasets. Mask & Focus achieves significant improvement in performance over existing baselines for conversation modelling with respect to several metrics. |
| Researcher Affiliation | Industry | Gaurav Pandey, Dinesh Raghu, Sachindra Joshi IBM Research, New Delhi, India {gpandey1, diraghu1, jsachind}@in.ibm.com |
| Pseudocode | No | The paper describes its methods and models in prose and with diagrams (Figure 2), but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code, such as a repository link or an explicit statement of code release. |
| Open Datasets | Yes | The proposed model is evaluated for generating responses on the Ubuntu Dialogue Corpus (Lowe et al. 2015). |
| Dataset Splits | Yes | Table 1 depicts some statistics for this dataset: Training Pairs 499,873 20,000 Validation Pairs 19,560 10,000 Test Pairs 18,920 10,000 |
| Hardware Specification | No | The paper describes model architectures and dimensions (e.g., '500-dimensional word embeddings', 'hidden size of 1,000') but does not specify any concrete hardware details such as CPU/GPU models or memory used for experiments. |
| Software Dependencies | No | The paper mentions 'We implemented Mask & Focus using the Pytorch library (Paszke et al. 2017)', but it does not provide a specific version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We use 500-dimensional word embeddings for all our experiments. For the Ubuntu dataset, the utterance and utterance concept encoders are single-layer bidirectional encoders, where each direction has a hidden size of 1, 000. ... We use a fixed vocabulary size of 20, 000. ... We used the Adam optimizer with a learning rate of 1.5 10 4. A batch size of 10 conversations is used for training. ... To prevent the model from overfitting, we use early stopping with log-likelihood on validation set as evaluation criteria. |