FIRE: Semantic Field of Words Represented as Non-Linear Functions

Authors: Xin Du, Kumiko Tanaka-Ishii

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By implementing FIRE for English and comparing it with previous representation methods via word and sentence similarity tasks, we show that FIRE produces comparable or even better results. In an evaluation of polysemy to predict the number of word senses, FIRE greatly outperformed BERT and Word2vec, providing evidence of how FIRE represents polysemy.
Researcher Affiliation Academia Xin Du The University of Tokyo duxin@cl.rcast.u-tokyo.ac.jp Kumiko Tanaka-Ishii The University of Tokyo kumiko@cl.rcast.u-tokyo.ac.jp
Pseudocode No The paper describes its methods using mathematical formulas and prose but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/kduxin/firelang.
Open Datasets Yes We obtained FIRE representations on two datasets: the large Wacky [4] dataset and the small text8 dataset, which contain about 3 billion and 17 million tokens, respectively. ... The datasets wacky and Word Net were cited.
Dataset Splits No The paper mentions training on the Wacky and text8 datasets but does not explicitly specify train/validation/test dataset splits, percentages, or sample counts in the main text.
Hardware Specification No The paper states that training took 'about 10 hours for FIRE with 50 parameters per word on a single GPU' but does not specify the exact model or type of GPU.
Software Dependencies No The paper mentions using the 'NLTK toolkit' but does not provide specific version numbers for this or any other software dependencies.
Experiment Setup Yes For the SGNS hyperparameters, we followed the settings recommended by [15]. We used one negative sample per sample, and the subsample rate for wi was 1e-5 for Wacky and 1e-4 for text8. The negative sampling probability was adjusted by a power of 0.75. For training FIRE representations, we adopted the Adam W optimizer [13]. The One Cycle learning rate scheduler [24] was applied, with the maximum learning rate set to 0.005. ... For the Wacky dataset, all methods were trained for three epochs... For text8, FIRE was trained for 15 epochs.