Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Modelling Form-Meaning Systematicity with Linguistic and Visual Features

Authors: Arie Soeteman, Dario Gutierrez, Elia Bruni, Ekaterina Shutova8870-8877

AAAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimenting on the English lexicon, we ﬁnd that the inclusion of visual features allows us to identify more form-meaning systematicity than when using a textbased model alone.
Researcher Affiliation	Collaboration	Arie Soeteman University of Amsterdam EMAIL Dario Gutierrez IBM Research EMAIL Elia Bruni University of Amsterdam EMAIL Ekaterina Shutova University of Amsterdam EMAIL
Pseudocode	No	The paper describes the methodology using mathematical equations and diagrams (e.g., Figure 1, Figure 2) but does not include any pseudocode or explicitly labeled algorithm blocks.
Open Source Code	No	The paper mentions a third-party model's availability ('The model is freely available as part of the Word2Vec system release1, along with 300-dimensional vector representations for 3 million words and phrases.'), but does not state that the authors' own implementation code for their methodology is open source or publicly provided.
Open Datasets	Yes	We use the same lexicon as Gutierrez et al. (2016), which has been constructed by cross-referencing monomorphemic English words in the CELEX lexical database (Baayen, Piepenbrock, and Gulikers 1996) with monomorphemic words in the Oxford English Dictionary Online (Simpson, Weiner, and others 1989).; We used skip-gram with negative sampling (Mikolov et al. 2013) trained on the Google News dataset as our text-based model.; a deep convolutional neural network that was trained on the Image Net classiﬁcation task (Russakovsky et al. 2015).
Dataset Splits	No	The paper mentions training a neural network 'on the entire lexicon' for one of its multimodal models, but it does not specify any explicit train/validation/test dataset splits with percentages or counts for its primary experiments.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used to run the experiments.
Software Dependencies	No	The paper mentions software tools like 'Word2Vec system release' and 'Caffe deep learning framework' but does not specify their version numbers or other software dependencies with specific versions required for reproducibility.
Experiment Setup	Yes	We initialize edit weights to 1 and optimize them by minimizing the MSE until convergence.; The optimal weighting factor for our two monomodal models was 0.75.; The ﬁrst three layers of both branches are fully connected Re LU layers.; We train the network on the entire lexicon using MSE as the loss function.