Density Estimation via Discrepancy Based Adaptive Sequential Partition

Authors: Dangna Li, Kun Yang, Wing Hung Wong

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate its efficiency as a density estimation method. We also show how it can be utilized to find good initializations for k-means.
Researcher Affiliation Collaboration Dangna Li ICME, Stanford University Stanford, CA 94305 dangna@stanford.edu Kun Yang Google Mountain View, CA 94043 kunyang@stanford.edu Wing Hung Wong Department of Statistics Stanford University Stanford, CA 94305 whwong@stanford.edu
Pseudocode Yes The pseudocode for the complete algorithm is given in Algorithm 1.
Open Source Code No The paper does not provide any links or explicit statements about the availability of open-source code for the methodology.
Open Datasets Yes 1) To demonstrate the method and visualize the results, we apply it on several 2-dimensional data sets simulated from 3 distributions with different geometry: 1. Gaussian: x N(µ, Σ)1{x [0, 1]2}, with µ = (.5, .5)T , Σ = [0.08, 0.02; 0.02, 0.02] 2. Mixture of Gaussians: x 1/2 P2 i=1 N(µi, Σi)1{x [0, 1]2} with µ1 = (.50, .25)T , and µ2 = (.50, .75)T , Σ1 = Σ2 = [0.04, 0.01; 0.01, 0.01]; 3. Mixture of Betas: x 1/3(beta(2, 5)beta(5, 2)+beta(4, 2)beta(2, 4)+beta(1, 3)beta(3, 1)); We simulated 10^5 points for each distribution. ... We test DSP-kmeans on 4 real world datasets of various number of data points and dimensions. Two of them are taken from the UCI machine learning repository [19]; the stem cell data set is taken from the Flow CAP challenges [20]; the mouse bone marrow data set is a recently published single-cell dataset measured using mass cytometry [21].
Dataset Splits No The paper describes simulated datasets and real-world datasets but does not specify training, validation, or test splits. For simulated data, it indicates generating N samples without explicit splitting. For real-world datasets, it just states they were used without split details.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications.
Software Dependencies No The paper mentions 'All methods are implemented in C++' but does not specify any software libraries or packages with version numbers that would be needed for replication.
Experiment Setup Yes We set m = 10, θ = 0.01 in our experiments. We found the resulting Hellinger distance to be quite robust as m ranges from 3 to 20 (equally spaced). ... Unless otherwise stated, we use copula transform in our experiments whenever the dimension exceeds 3.