All-but-the-Top: Simple and Effective Postprocessing for Word Representations
Authors: Jiaqi Mu, Pramod Viswanath
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and text classification) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones. |
| Researcher Affiliation | Academia | Jiaqi Mu, Pramod Viswanath University of Illinois at Urbana Champaign {jiaqimu2, pramodv}@illinois.edu |
| Pseudocode | Yes | Algorithm 1: Postprocessing algorithm on word representations. Input :Word representations {v(w), w V}, a threshold parameter D, 1 Compute the mean of {v(w), w V}, µ 1 |V| P w V v(w), v(w) v(w) µ 2 Compute the PCA components: u1, ..., ud PCA({ v(w), w V}). 3 Preprocess the representations: v (w) v(w) PD i=1 u i v(w) ui Output :Processed representations v (w). |
| Open Source Code | No | The paper provides links to third-party word representations and a third-party CNN text classification implementation, but does not state that the code for their proposed postprocessing methodology is open-source or publicly available. |
| Open Datasets | Yes | For this experiment, we use seven standard datasets: the first published RG65 dataset (Rubenstein & Goodenough, 1965); the widely used Word Sim-353 (WS) dataset (Finkelstein et al., 2001)... |
| Dataset Splits | Yes | In TREC, SST and IMDb, the datasets have already been split into train/test sets. Otherwise we use 10-fold cross validation in the remaining datasets (i.e., MR and SUBJ). Detailed statistics of various features of each of the datasets are provided in Table 21. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'implemented using tensorflow' but does not provide specific version numbers for TensorFlow or any other software libraries used. |
| Experiment Setup | No | While the paper specifies the hyperparameter 'D' for its postprocessing (e.g., 'We choose D = 3 for WORD2VEC and D = 2 for GLOVE' and 'D to vary between 0 and 4'), it lacks other crucial experimental setup details such as learning rates, batch sizes, number of epochs, or specific optimizer settings for the neural network models used. |