FIRE: Semantic Field of Words Represented as Non-Linear Functions
Authors: Xin Du, Kumiko Tanaka-Ishii
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By implementing FIRE for English and comparing it with previous representation methods via word and sentence similarity tasks, we show that FIRE produces comparable or even better results. In an evaluation of polysemy to predict the number of word senses, FIRE greatly outperformed BERT and Word2vec, providing evidence of how FIRE represents polysemy. |
| Researcher Affiliation | Academia | Xin Du The University of Tokyo duxin@cl.rcast.u-tokyo.ac.jp Kumiko Tanaka-Ishii The University of Tokyo kumiko@cl.rcast.u-tokyo.ac.jp |
| Pseudocode | No | The paper describes its methods using mathematical formulas and prose but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/kduxin/firelang. |
| Open Datasets | Yes | We obtained FIRE representations on two datasets: the large Wacky [4] dataset and the small text8 dataset, which contain about 3 billion and 17 million tokens, respectively. ... The datasets wacky and Word Net were cited. |
| Dataset Splits | No | The paper mentions training on the Wacky and text8 datasets but does not explicitly specify train/validation/test dataset splits, percentages, or sample counts in the main text. |
| Hardware Specification | No | The paper states that training took 'about 10 hours for FIRE with 50 parameters per word on a single GPU' but does not specify the exact model or type of GPU. |
| Software Dependencies | No | The paper mentions using the 'NLTK toolkit' but does not provide specific version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | For the SGNS hyperparameters, we followed the settings recommended by [15]. We used one negative sample per sample, and the subsample rate for wi was 1e-5 for Wacky and 1e-4 for text8. The negative sampling probability was adjusted by a power of 0.75. For training FIRE representations, we adopted the Adam W optimizer [13]. The One Cycle learning rate scheduler [24] was applied, with the maximum learning rate set to 0.005. ... For the Wacky dataset, all methods were trained for three epochs... For text8, FIRE was trained for 15 epochs. |