Challenges in Materials Discovery – Synthetic Generator and Real Datasets

Authors: Ronan Le Bras, Richard Bernstein, John Gregoire, Santosh Suram, Carla Gomes, Bart Selman, R. Bruce van Dover

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In collaboration with two major research laboratories in materials science, we provide the first publicly available dataset for the phase map identification problem. In addition, we provide a parameterized synthetic data generator to assess the quality of proposed approaches, as well as tools for data visualization and solution evaluation. ... The experiments were run on an infrastructure supported by the NSF Computing research infrastructure for Computational Sustainability grant (grant 1059284).
Researcher Affiliation Academia Ronan Le Bras Richard Bernstein Computer Science Dept. Cornell University, Ithaca NY; John M. Gregoire Santosh K. Suram Joint Center for Artificial Photosynthesis California Inst. of Technology, Pasadena CA; Carla P. Gomes Bart Selman Computer Science Dept. Cornell University, Ithaca NY; R. Bruce van Dover Materials Science and Engineering Dept. Cornell University, Ithaca NY
Pseudocode No The paper includes 'Listing 1: Data format example' and 'Listing 2: Solution format example', which describe data structures, not pseudocode or algorithms.
Open Source Code Yes We have developed a graphical user-interface application for exploring and visualizing input datasets as well as solutions to the phase map identification problem.1Available at http://www.udiscover.it
Open Datasets Yes In collaboration with two major research laboratories in materials science, we provide the first publicly available dataset for the phase map identification problem. ... Using this generator, we provide a benchmark of 100 instances, with varying complexity2. 2Available at http://www.udiscover.it
Dataset Splits No The paper does not specify training, validation, or test splits for computational experiments. Its purpose is to provide datasets for others to use, rather than to present results from a specific model trained on these datasets.
Hardware Specification No The paper mentions that 'The experiments were run on an infrastructure supported by the NSF Computing research infrastructure for Computational Sustainability grant' and facilities like 'Stanford Synchrotron Radiation Lightsource' and 'Cornell High Energy Synchrotron Source (CHESS)' for data collection, but it does not specify any concrete hardware details (e.g., CPU, GPU models, memory) used for computational processing or the GUI.
Software Dependencies No The paper mentions various tools and databases used in materials science (e.g., X-ray diffraction, wavelet-based peak detection algorithm, NIST, ICDD, Materials Project, aflowlib.org), but it does not specify any software dependencies with version numbers for the graphical user interface or synthetic data generator they developed.
Experiment Setup Yes The generator has a set of user-specified parameters that allow controlling the complexity of the generated instances, as follows: 1. Underlying system that governs the number of phases K and their concentration on the film. 2. Spacing (in atomic percent) of the data points, which determines the total number N of points. 3. Total number of peaks of the component phases, up to the theoretically defined number of peaks. 4. Number of diffraction angles of the X-ray patterns, which corresponds to the x-axis precision of the patterns. 5. Noise level as a total number (or total amount) of removed peaks from the original constructed patterns.