A Multiplicative Model for Learning Distributed Text-Based Attribute Representations

Authors: Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform several experimental tasks including sentiment classification, cross-lingual document classification, and blog authorship attribution. We also qualitatively evaluate conditional word neighbours and attribute-conditioned text generation. In this section we describe our experimental evaluation and results. Throughout this section we refer to our model as Attribute Tensor Decomposition (ATD).
Researcher Affiliation Academia Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov University of Toronto Canadian Institute for Advanced Research {rkiros, zemel, rsalakhu}@cs.toronto.edu
Pseudocode No The paper describes the models and their mathematical formulations but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We first demonstrate initial qualitative results to get a sense of the tasks our model can perform. For these, we use the small project Gutenberg corpus... Our first quantitative experiments are performed on the sentiment treebank of [3]... We use the Europarl corpus [23] for inducing word representations across languages... Evaluation is then performed on English and German sections of the Reuters RCV1/RCV2 corpora... For our final task, we use the Blog corpus of [24]...
Dataset Splits Yes We used a monolingual validation set for tuning the margin α, which was set to α = 1.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes All models are trained using stochastic gradient descent with an exponential learning rate decay and linear (per epoch) increase in momentum. We used a context size of 8, 100 dimensional word vectors initialized from [2] and 100 dimensional sentence vectors initialized by averaging vectors of words from the corresponding sentence.