A Multiplicative Model for Learning Distributed Text-Based Attribute Representations
Authors: Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov
NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform several experimental tasks including sentiment classification, cross-lingual document classification, and blog authorship attribution. We also qualitatively evaluate conditional word neighbours and attribute-conditioned text generation. In this section we describe our experimental evaluation and results. Throughout this section we refer to our model as Attribute Tensor Decomposition (ATD). |
| Researcher Affiliation | Academia | Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov University of Toronto Canadian Institute for Advanced Research {rkiros, zemel, rsalakhu}@cs.toronto.edu |
| Pseudocode | No | The paper describes the models and their mathematical formulations but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We first demonstrate initial qualitative results to get a sense of the tasks our model can perform. For these, we use the small project Gutenberg corpus... Our first quantitative experiments are performed on the sentiment treebank of [3]... We use the Europarl corpus [23] for inducing word representations across languages... Evaluation is then performed on English and German sections of the Reuters RCV1/RCV2 corpora... For our final task, we use the Blog corpus of [24]... |
| Dataset Splits | Yes | We used a monolingual validation set for tuning the margin α, which was set to α = 1. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | All models are trained using stochastic gradient descent with an exponential learning rate decay and linear (per epoch) increase in momentum. We used a context size of 8, 100 dimensional word vectors initialized from [2] and 100 dimensional sentence vectors initialized by averaging vectors of words from the corresponding sentence. |