reproducibilityindex.ai

Modeling Language Tokens as Functionals of Semantic Fields

Authors: Zhengqi Pei, Anran Zhang, Shuhui Wang, Qingming Huang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted for standard reading comprehension and question-answering tasks demonstrate that the Las F-based models consistently improve accuracy with fewer parameters. Besides, we use Commonsense QA s blind test set to evaluate a full-parameter tuned Las F-based model, which outperforms the prior best ensemble and single models by 0.4% and 3.1%, respectively. Furthermore, our Las F-only language model trained from scratch outperforms existing parameter-efficient language models on standard datasets such as Wiki Text103 and Penn Treebank.
Researcher Affiliation	Academia	1Institute of Computing Technology, Chinese Academy of Sciences. 2School of Artificial Intelligence, University of Chinese Academy of Sciences. 3Peng Cheng Laboratory. 4School of Computer Science and Technology, University of Chinese Academy of Sciences.
Pseudocode	No	The paper provides mathematical equations describing the model's workflow but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Codes are available at https: //github.com/pzqpzq/flat-learning.
Open Datasets	Yes	SQu AD2.0 dataset (Rajpurkar et al., 2018)... Commonsense QA (Talmor et al., 2018), Physical Interaction QA (Bisk et al., 2020), and Social Interaction QA (Sap et al., 2019)... Wiki Text103 (Merity et al., 2016) and Penn Treebank (Marcus et al., 1993)... Wiki QA dataset (Yang et al., 2015)
Dataset Splits	No	While the paper mentions using training and development sets (e.g., 'The training dataset contains 87k answerable and 43k unanswerable questions' for SQuAD2.0, and 'in-house controlled experiments on Cs QA s development sets'), it does not provide specific reproducible percentages, sample counts, or explicit splitting methodologies for all train/validation/test splits across all datasets used.
Hardware Specification	Yes	All experiments can be conducted on a single 24GB memory Ge Force RTX 4090 GPU, on which we can train a Las F-based language model within a week.
Software Dependencies	No	The paper mentions using 'spa Cy toolkit (Vasiliev, 2020)' and 'GPT2 tokenizer' but does not provide specific version numbers for these or other software dependencies like deep learning frameworks (e.g., PyTorch, TensorFlow) or Python.
Experiment Setup	Yes	The hyperparameters such as LX, DX, LY and DY are predetermined in terms of the current task, while other hyperparameters like LQ, DQ, DS, Ns and Nf can be adjusted based on their performance on the development set. For training details, we use SGD with a decaying learning rate starting from 0.3 as the optimizer; we set the batch size as 16 and the maximum number of epochs as 50. The model architecture follows Eq. 14 with DX = 768, DQ = 15, and Ns = 12. (For other hyper-parameters, see Table 6.)