Crystallization Learning with the Delaunay Triangulation

Authors: Jiaqi Gu, Guosheng Yin

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on synthetic data under two different scenarios: (i) to illustrate the effectiveness of the crystallization learning in estimating the conditional expectation function µ( ); (ii) to evaluate the estimation accuracy of our approach in comparison with existing nonparametric regression methods, including the k-NN regression using the Euclidean distance, local linear regression using Gaussian kernel, multivariate kernel regression using Gaussian kernel (Hein, 2009) and Gaussian process models; and (iii) to validate the proposed data-driven procedure for selection of the hyperparameter L. We also apply our method to real data to investigate its empirical performance.
Researcher Affiliation Academia 1Department of Statistics and Actuarial Science, University of Hong Kong, Hong Kong SAR.
Pseudocode Yes Algorithm 1 DELAUNAYSPARSE (Chang et al., 2020)
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets Yes We apply the crystallization learning to several real datasets from the UCI repository. The critical assessment of protein structure prediction (CASP) dataset2... The Concrete dataset3... Parkinson s telemonitoring dataset4...
Dataset Splits Yes For each dataset, we take 100 bootstrap samples without replacement of size n (n = 200, 500, 1000 or 2000) for training and 100 bootstrap samples of size 100 for testing. Similar to many other machine learning methods, the statistical complexity and estimation performance of the crystallization learning is controlled by the hyperparameter L, the maximal topological distance from the generated neighbor Delaunay simplices to S(z). Because a small L leads to overfitting and a large L makes ˆµ( ) overly smooth, we propose adopting the leave-one-out cross validation (LOO-CV) to select L with respect to the target point z as follows.
Hardware Specification No The paper provides runtime measurements in Table 1 but does not specify any particular hardware components like CPU or GPU models, or memory details used for the experiments.
Software Dependencies No The paper mentions various regression methods and the DELAUNAYSPARSE algorithm, but it does not specify any software names with version numbers (e.g., Python, PyTorch, scikit-learn versions) required for reproduction.
Experiment Setup Yes We implement the crystallization learning with L = 3 for d = 5, 10 and L = 2 for d = 20, 50, and obtain ˆµ(z1), . . . , ˆµ(z100). We implement the k-NN regression with k = 5, 10, k , where k equals the size of Vz,L, and the local linear regression and kernel regression with bandwidth h = 1.