Challenges in Materials Discovery – Synthetic Generator and Real Datasets
Authors: Ronan Le Bras, Richard Bernstein, John Gregoire, Santosh Suram, Carla Gomes, Bart Selman, R. Bruce van Dover
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In collaboration with two major research laboratories in materials science, we provide the first publicly available dataset for the phase map identification problem. In addition, we provide a parameterized synthetic data generator to assess the quality of proposed approaches, as well as tools for data visualization and solution evaluation. ... The experiments were run on an infrastructure supported by the NSF Computing research infrastructure for Computational Sustainability grant (grant 1059284). |
| Researcher Affiliation | Academia | Ronan Le Bras Richard Bernstein Computer Science Dept. Cornell University, Ithaca NY; John M. Gregoire Santosh K. Suram Joint Center for Artificial Photosynthesis California Inst. of Technology, Pasadena CA; Carla P. Gomes Bart Selman Computer Science Dept. Cornell University, Ithaca NY; R. Bruce van Dover Materials Science and Engineering Dept. Cornell University, Ithaca NY |
| Pseudocode | No | The paper includes 'Listing 1: Data format example' and 'Listing 2: Solution format example', which describe data structures, not pseudocode or algorithms. |
| Open Source Code | Yes | We have developed a graphical user-interface application for exploring and visualizing input datasets as well as solutions to the phase map identification problem.1Available at http://www.udiscover.it |
| Open Datasets | Yes | In collaboration with two major research laboratories in materials science, we provide the first publicly available dataset for the phase map identification problem. ... Using this generator, we provide a benchmark of 100 instances, with varying complexity2. 2Available at http://www.udiscover.it |
| Dataset Splits | No | The paper does not specify training, validation, or test splits for computational experiments. Its purpose is to provide datasets for others to use, rather than to present results from a specific model trained on these datasets. |
| Hardware Specification | No | The paper mentions that 'The experiments were run on an infrastructure supported by the NSF Computing research infrastructure for Computational Sustainability grant' and facilities like 'Stanford Synchrotron Radiation Lightsource' and 'Cornell High Energy Synchrotron Source (CHESS)' for data collection, but it does not specify any concrete hardware details (e.g., CPU, GPU models, memory) used for computational processing or the GUI. |
| Software Dependencies | No | The paper mentions various tools and databases used in materials science (e.g., X-ray diffraction, wavelet-based peak detection algorithm, NIST, ICDD, Materials Project, aflowlib.org), but it does not specify any software dependencies with version numbers for the graphical user interface or synthetic data generator they developed. |
| Experiment Setup | Yes | The generator has a set of user-specified parameters that allow controlling the complexity of the generated instances, as follows: 1. Underlying system that governs the number of phases K and their concentration on the film. 2. Spacing (in atomic percent) of the data points, which determines the total number N of points. 3. Total number of peaks of the component phases, up to the theoretically defined number of peaks. 4. Number of diffraction angles of the X-ray patterns, which corresponds to the x-axis precision of the patterns. 5. Noise level as a total number (or total amount) of removed peaks from the original constructed patterns. |