reproducibilityindex.ai

Challenges in Materials Discovery – Synthetic Generator and Real Datasets

Authors: Ronan Le Bras, Richard Bernstein, John Gregoire, Santosh Suram, Carla Gomes, Bart Selman, R. Bruce van Dover

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In collaboration with two major research laboratories in materials science, we provide the ﬁrst publicly available dataset for the phase map identiﬁcation problem. In addition, we provide a parameterized synthetic data generator to assess the quality of proposed approaches, as well as tools for data visualization and solution evaluation. ... The experiments were run on an infrastructure supported by the NSF Computing research infrastructure for Computational Sustainability grant (grant 1059284).
Researcher Affiliation	Academia	Ronan Le Bras Richard Bernstein Computer Science Dept. Cornell University, Ithaca NY; John M. Gregoire Santosh K. Suram Joint Center for Artiﬁcial Photosynthesis California Inst. of Technology, Pasadena CA; Carla P. Gomes Bart Selman Computer Science Dept. Cornell University, Ithaca NY; R. Bruce van Dover Materials Science and Engineering Dept. Cornell University, Ithaca NY
Pseudocode	No	The paper includes 'Listing 1: Data format example' and 'Listing 2: Solution format example', which describe data structures, not pseudocode or algorithms.
Open Source Code	Yes	We have developed a graphical user-interface application for exploring and visualizing input datasets as well as solutions to the phase map identiﬁcation problem.1Available at http://www.udiscover.it
Open Datasets	Yes	In collaboration with two major research laboratories in materials science, we provide the ﬁrst publicly available dataset for the phase map identiﬁcation problem. ... Using this generator, we provide a benchmark of 100 instances, with varying complexity2. 2Available at http://www.udiscover.it
Dataset Splits	No	The paper does not specify training, validation, or test splits for computational experiments. Its purpose is to provide datasets for others to use, rather than to present results from a specific model trained on these datasets.
Hardware Specification	No	The paper mentions that 'The experiments were run on an infrastructure supported by the NSF Computing research infrastructure for Computational Sustainability grant' and facilities like 'Stanford Synchrotron Radiation Lightsource' and 'Cornell High Energy Synchrotron Source (CHESS)' for data collection, but it does not specify any concrete hardware details (e.g., CPU, GPU models, memory) used for computational processing or the GUI.
Software Dependencies	No	The paper mentions various tools and databases used in materials science (e.g., X-ray diffraction, wavelet-based peak detection algorithm, NIST, ICDD, Materials Project, aﬂowlib.org), but it does not specify any software dependencies with version numbers for the graphical user interface or synthetic data generator they developed.
Experiment Setup	Yes	The generator has a set of user-speciﬁed parameters that allow controlling the complexity of the generated instances, as follows: 1. Underlying system that governs the number of phases K and their concentration on the ﬁlm. 2. Spacing (in atomic percent) of the data points, which determines the total number N of points. 3. Total number of peaks of the component phases, up to the theoretically deﬁned number of peaks. 4. Number of diffraction angles of the X-ray patterns, which corresponds to the x-axis precision of the patterns. 5. Noise level as a total number (or total amount) of removed peaks from the original constructed patterns.