ALaSca: an Automated approach for Large-Scale Lexical Substitution

Authors: Caterina Lacerra, Tommaso Pasini, Rocco Tripodi, Roberto Navigli

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through different experiments, we show that a simple BERT-based model provides better substitutes when using ALa Sca training data than when being restricted to gold instances only, reaching performances that are higher, or on par with, complex state-of-the-art models.
Researcher Affiliation Academia Caterina Lacerra1 , Tommaso Pasini2, , Rocco Tripodi1 and Roberto Navigli1 1 Department of Computer Science, Sapienza University of Rome 2 Department of Computer Science, University of Copenhagen
Pseudocode No The paper includes a pipeline diagram (Figure 1) but no pseudocode or algorithm blocks.
Open Source Code Yes We release ALa Sca at https://sapienzanlp. github.io/alasca/.
Open Datasets Yes We consider the training split of Co In Co (Co In Co T) and TWSIT, i.e., the 70% of TWSI instances that we did not use for development. (...) To generate a dataset for training, we feed ALa Sca with the list of lemmas in Co In Co [Kremer et al., 2014] and LST [Mc Carthy and Navigli, 2009], using Wikipedia (December 2019 dump) as corpus to retrieve the sentences.
Dataset Splits Yes As development, we use 30% of TWSI instances concatenated with the development split of Co In Co and we exclude the target words occurring in the development set from the data considered for training.
Hardware Specification No The paper does not explicitly describe the specific hardware used (e.g., GPU/CPU models, memory specifications) for running its experiments.
Software Dependencies No The paper mentions models and optimizers like BERT, LASER, S-BERT, and RAdam, but does not provide specific version numbers for any software libraries or dependencies used in the experiments.
Experiment Setup Yes We train the reference model ( 4.1) with the Kullback Leibler divergence loss with RAdam [Liu et al., 2019] and learning rate 10 5. We set the maximum epochs to 5, with early stopping and patience set to 3. (...) As static word representation we leverage Concept Net Numberbatch vectors [Speer and Lowry-Duda, 2017], and set γ = 0.4 and α = 1 .