StarSpace: Embed All The Things!
Authors: Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, Jason Weston
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on a number of tasks show that Star Space is highly competitive with existing methods, whilst also being generally applicable to new cases where those methods are not. |
| Researcher Affiliation | Industry | Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes Jason Weston Facebook AI Research |
| Pseudocode | No | The paper describes the model's operations and training process in the 'Model' section using descriptive text, but it does not include a formal pseudocode block or algorithm listing. |
| Open Source Code | Yes | Star Space is available as an open-source project at https: //github.com/facebookresearch/Starspace. |
| Open Datasets | Yes | We use three datasets: AG news1 is a 4 class text classification task given title and description fields as input. It consists of 120K training examples, 7600 test examples, 4 classes, 100K words and 5M tokens in total. DBpedia (Lehmann et al. 2015) is a 14 class classification problem given the title and abstract of Wikipedia articles as input. It consists of 560K training examples, 70k test examples, 14 classes, 800K words and 32M tokens in total. The Yelp reviews dataset is obtained from the 2015 Yelp Dataset Challenge2. The task is to predict the full number of stars the user has given (from 1 to 5). and We use the Freebase 15k dataset from (Bordes et al. 2013)... and We use the Wikipedia dataset introduced by (Chen et al. 2017)... |
| Dataset Splits | Yes | The training set contains 483,142 triplets, the validation set 50,000 and the test set 59,071. and The dataset is split into 5,035,182 training examples, 10,000 validation examples and 10,000 test examples. |
| Hardware Specification | No | The paper mentions optimization 'over multiple CPUs' but does not provide specific details on CPU models, GPU models, memory, or other hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions tools and algorithms like Adagrad, Hogwild, fast Text, word2vec, and Sent Eval, but it does not specify any version numbers for these or other software dependencies, which would be required for reproducibility. |
| Experiment Setup | Yes | In these experiments we set the dimension of embeddings to be 10, as in (Joulin et al. 2016). and We optimize by stochastic gradient descent (SGD), i.e., each SGD step is one sample from E+ in the outer sum, using Adagrad (Duchi, Hazan, and Singer 2011) and hogwild (Recht et al. 2011) over multiple CPUs. and We set dim = 50, and the max training time of the algorithm to be 1 hour for all experients... We set k = [1, 5, 10, 25, 50, 100, 250, 500, 1000]. and for Task 1, for label features we use a feature dropout probability of 0.8 which both regularizes and greatly speeds up training. |