All-but-the-Top: Simple and Effective Postprocessing for Word Representations

Authors: Jiaqi Mu, Pramod Viswanath

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and text classification) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones.
Researcher Affiliation Academia Jiaqi Mu, Pramod Viswanath University of Illinois at Urbana Champaign {jiaqimu2, pramodv}@illinois.edu
Pseudocode Yes Algorithm 1: Postprocessing algorithm on word representations. Input :Word representations {v(w), w V}, a threshold parameter D, 1 Compute the mean of {v(w), w V}, µ 1 |V| P w V v(w), v(w) v(w) µ 2 Compute the PCA components: u1, ..., ud PCA({ v(w), w V}). 3 Preprocess the representations: v (w) v(w) PD i=1 u i v(w) ui Output :Processed representations v (w).
Open Source Code No The paper provides links to third-party word representations and a third-party CNN text classification implementation, but does not state that the code for their proposed postprocessing methodology is open-source or publicly available.
Open Datasets Yes For this experiment, we use seven standard datasets: the first published RG65 dataset (Rubenstein & Goodenough, 1965); the widely used Word Sim-353 (WS) dataset (Finkelstein et al., 2001)...
Dataset Splits Yes In TREC, SST and IMDb, the datasets have already been split into train/test sets. Otherwise we use 10-fold cross validation in the remaining datasets (i.e., MR and SUBJ). Detailed statistics of various features of each of the datasets are provided in Table 21.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'implemented using tensorflow' but does not provide specific version numbers for TensorFlow or any other software libraries used.
Experiment Setup No While the paper specifies the hyperparameter 'D' for its postprocessing (e.g., 'We choose D = 3 for WORD2VEC and D = 2 for GLOVE' and 'D to vary between 0 and 4'), it lacks other crucial experimental setup details such as learning rates, batch sizes, number of epochs, or specific optimizer settings for the neural network models used.