Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Mode-Seeking Clustering and Density Ridge Estimation via Direct Estimation of Density-Derivative-Ratios
Authors: Hiroaki Sasaki, Takafumi Kanamori, Aapo Hyvärinen, Gang Niu, Masashi Sugiyama
JMLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we experimentally demonstrate that the developed methods significantly outperform existing methods, particularly for relatively high-dimensional data. ... Section 5 experimentally investigates the performance of the proposed methods for mode-seeking clustering and density ridge estimation. ... Fig.6(b,e,h) clearly indicates the advantage of our clustering methods over MS: Both LSLDGC and LSLDGCCW significantly outperform MSLS and MSNR particularly for higher-dimensional data. ... Table 1: The average and standard deviation of ARI values over 50 runs. |
| Researcher Affiliation | Academia | Hiroaki Sasaki EMAIL Graduate School of Information Science Nara Institute of Science and Technology Nara, Japan; Takafumi Kanamori EMAIL Department of Mathematical and Computing Science Tokyo Institute of Technology Tokyo, Japan Center for Advanced Intelligence Project RIKEN Tokyo, Japan; Aapo Hyvarinen EMAIL Gatsby Computational Neuroscience Unit University College London London, United Kingdom Department of Computer Science University of Helsinki Helsinki, Finland Canadian Institute for Advanced Research; Gang Niu EMAIL Graduate School of Frontier Sciences The University of Tokyo Chiba, Japan Center for Advanced Intelligence Project RIKEN Tokyo, Japan; Masashi Sugiyama EMAIL Center for Advanced Intelligence Project RIKEN Tokyo, Japan Graduate School of Frontier Sciences The University of Tokyo Chiba, Japan. All institutions are academic or public research institutes with corresponding academic email domains. |
| Pseudocode | Yes | Figure 3: The mode-seeking algorithm in LSLDGC. ... Figure 4: Two mode-seeking algorithms in LSLDGC. ... Figure 5: The algorithm of LSDRF. |
| Open Source Code | Yes | A MATLAB package of LSLDGC is available at https://sites.google.com/site/hworksites/ home/software/lsldg. ... A MATLAB package of LSDRF is available at https://sites.google.com/site/hworksites/home/ software/lsdrf. |
| Open Datasets | Yes | Banknote (D = 4, n = 100, and c = 2) (Bache and Lichman, 2013)5: This dataset consists of four-dimensional features... https://archive.ics.uci.edu/ml/datasets/banknote+authentication# ... Accelerometry (D = 5, n = 300, and c = 3)6: The ALKAN dataset contains... http://alkan.mns.kyutech.ac.jp/web/data.html ... Olive oil (D = 8, n = 200, and c = 9) (Forina et al., 1983). This dataset was obtained from the R software.7 https://artax.karlin.mff.cuni.cz/r-help/library/pdf Cluster/html/oliveoil.html ... Vowel (D = 10, n = 110, and c = 11) (Turney, 1993; Bache and Lichman, 2013)8: This consists utterance data... https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+(Vowel+ Recognition+-+Deterding+Data) ... Sat-image (D = 36, n = 120, and c = 6) (Bache and Lichman, 2013)9: The dataset contains the multi-spectral values... https://archive.ics.uci.edu/ml/datasets/Statlog+(Landsat+Satellite) ... New Madrid earthquake dataset: This seismological dataset was downloaded from the Center for Earthquake Research and Information.11 http://www.memphis.edu/ceri/seismic/ ... Shapley galaxy dataset: This dataset was downloaded from the Center for Astrostatistics at Pennsylvania State University.12 http://astrostatistics.psu.edu/datasets/Shapley_galaxy.html |
| Dataset Splits | No | The paper describes how data samples were selected for experiments (e.g., "We randomly chose 50 samples from each of the two classes," or "we randomly chose n data samples from each region"), but it does not specify explicit training/test/validation splits with percentages, sample counts, or references to predefined splits for model evaluation tasks. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or other computing resource specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using "MATLAB package" for LSLDGC and LSDRF, and "R software" for processing the Olive oil dataset. However, it does not specify version numbers for MATLAB or R, nor does it list any other software libraries or frameworks with their versions. |
| Experiment Setup | Yes | LSLDGC: The width parameter σj in the Gaussian kernel and regularization parameter λj were selected by cross-validation as in Section 2.4. We selected ten candidates of σj and λj from cσ σ(j) med (0.5 cσ 5) and 10m ( 3 m 0), respectively where σ(j) med is the median value of |x(j) i x(j) k | with respect to i and k. ... LSDRF: When estimating gj(x), we selected ten candidates of the width parameter in the Gaussian kernel and the regularization parameter from 10l σ(j) med ( 0.3 l 1) and 10m ( 4 m 0), respectively. When estimating [H(x)]ij, ten candidates of the width parameter in the Gaussian kernel were selected from 10l qσ(i) medσ(j) med ( 0.3 l 1). For the regularization parameter, we used the same candidates as in gj(x). ... For SCMSLS, we employed the following adaptive-bandwidth Gaussian kernel: 1 (2πh2 i )D/2 exp x xi 2 , where hi denotes the bandwidth parameter. We restricted hi at the m-nearest neighbor Euclidean distance from xi to xj (i = j), and performed cross-validation with respect to m whose candidates were 128, 64, 32, 16, 8 and 4. ... Gradient ascent: Whenever b Dbg[zτ+1 k |zτ k] < 0 or j, fj(zτ k) 0, we perform the following gradient ascent: zτ+1 k = zτ k + ηbg(zτ k), where the step size parameter η is selected so that b Dbg[zτ k + ηbg(zτ k)|zτ k] is maximized. |