reproducibilityindex.ai

Density Estimation via Discrepancy Based Adaptive Sequential Partition

Authors: Dangna Li, Kun Yang, Wing Hung Wong

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate its efﬁciency as a density estimation method. We also show how it can be utilized to ﬁnd good initializations for k-means.
Researcher Affiliation	Collaboration	Dangna Li ICME, Stanford University Stanford, CA 94305 dangna@stanford.edu Kun Yang Google Mountain View, CA 94043 kunyang@stanford.edu Wing Hung Wong Department of Statistics Stanford University Stanford, CA 94305 whwong@stanford.edu
Pseudocode	Yes	The pseudocode for the complete algorithm is given in Algorithm 1.
Open Source Code	No	The paper does not provide any links or explicit statements about the availability of open-source code for the methodology.
Open Datasets	Yes	1) To demonstrate the method and visualize the results, we apply it on several 2-dimensional data sets simulated from 3 distributions with different geometry: 1. Gaussian: x N(µ, Σ)1{x [0, 1]2}, with µ = (.5, .5)T , Σ = [0.08, 0.02; 0.02, 0.02] 2. Mixture of Gaussians: x 1/2 P2 i=1 N(µi, Σi)1{x [0, 1]2} with µ1 = (.50, .25)T , and µ2 = (.50, .75)T , Σ1 = Σ2 = [0.04, 0.01; 0.01, 0.01]; 3. Mixture of Betas: x 1/3(beta(2, 5)beta(5, 2)+beta(4, 2)beta(2, 4)+beta(1, 3)beta(3, 1)); We simulated 10^5 points for each distribution. ... We test DSP-kmeans on 4 real world datasets of various number of data points and dimensions. Two of them are taken from the UCI machine learning repository [19]; the stem cell data set is taken from the Flow CAP challenges [20]; the mouse bone marrow data set is a recently published single-cell dataset measured using mass cytometry [21].
Dataset Splits	No	The paper describes simulated datasets and real-world datasets but does not specify training, validation, or test splits. For simulated data, it indicates generating N samples without explicit splitting. For real-world datasets, it just states they were used without split details.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions 'All methods are implemented in C++' but does not specify any software libraries or packages with version numbers that would be needed for replication.
Experiment Setup	Yes	We set m = 10, θ = 0.01 in our experiments. We found the resulting Hellinger distance to be quite robust as m ranges from 3 to 20 (equally spaced). ... Unless otherwise stated, we use copula transform in our experiments whenever the dimension exceeds 3.