Kernel Methods for Radial Transformed Compositional Data with Many Zeros

Authors: Junyoung Park, Changwon Yoon, Cheolwoo Park, Jeongyoun Ahn

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The applicability of the proposed approach is demonstrated with kernel principal component analysis. We then provide a quantitative assessment of k PCA on new synthetic data and real-world data examples.
Researcher Affiliation Academia 1Department of Mathematical Sciences, KAIST, Daejeon, Korea 2Department of Industrial & Systems Engineering, KAIST, Daejeon, Korea. Correspondence to: Jeongyoun Ahn <jyahn@kaist.ac.kr>, Cheolwoo Park <parkcw2021@kaist.ac.kr>.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper references third-party R packages used (e.g., z Compositions, phyloseq, GUniFrac) but does not provide a direct link or explicit statement about the availability of the source code for the methodology developed in this paper.
Open Datasets Yes Table 3. Data availability for real data examples. Dataset Data Source Hayden et al. (2020) BONUS-CF (WGS) dataset from Microbiome DB.org Gimblet et al. (2017) Experimental cutaneous leishmaniasis dataset from Microbiome DB.org Arumugam et al. (2011) enterotype dataset in R package phyloseq Carrieri et al. (2021) Supplementary material of the referenced article Charlson et al. (2010) throat.otu.tab dataset in R package GUni Frac Schiffer et al. (2019) vaginal.otu.tab dataset in R package GUni Frac
Dataset Splits No The paper discusses the use of synthetic and real-world datasets and their sizes, but it does not specify explicit training/validation/test dataset splits with percentages, sample counts, or clear splitting methodology for reproducibility.
Hardware Specification No The paper mentions computational cost but does not specify any hardware (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies No The paper mentions the use of 'R package z Compositions' for producing results, and other R packages as data sources, but it does not specify version numbers for these software components or any other software dependencies with version numbers.
Experiment Setup Yes Figure 4 shows projection plots from kernel PCA using Gaussian kernel with two different values of the parameter γ, for both the radial transformed data and the clr transformed data. Here, γ indicates the parameter for Gaussian kernel. For the polynomial kernel, the degree p = 3 is used. The parameter γ ranges from 1 to 100 for the radial transformed data, and from 0.0001 to 0.01 for the clr transformed data.