Density Estimation via Discrepancy Based Adaptive Sequential Partition
Authors: Dangna Li, Kun Yang, Wing Hung Wong
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate its efficiency as a density estimation method. We also show how it can be utilized to find good initializations for k-means. |
| Researcher Affiliation | Collaboration | Dangna Li ICME, Stanford University Stanford, CA 94305 dangna@stanford.edu Kun Yang Google Mountain View, CA 94043 kunyang@stanford.edu Wing Hung Wong Department of Statistics Stanford University Stanford, CA 94305 whwong@stanford.edu |
| Pseudocode | Yes | The pseudocode for the complete algorithm is given in Algorithm 1. |
| Open Source Code | No | The paper does not provide any links or explicit statements about the availability of open-source code for the methodology. |
| Open Datasets | Yes | 1) To demonstrate the method and visualize the results, we apply it on several 2-dimensional data sets simulated from 3 distributions with different geometry: 1. Gaussian: x N(µ, Σ)1{x [0, 1]2}, with µ = (.5, .5)T , Σ = [0.08, 0.02; 0.02, 0.02] 2. Mixture of Gaussians: x 1/2 P2 i=1 N(µi, Σi)1{x [0, 1]2} with µ1 = (.50, .25)T , and µ2 = (.50, .75)T , Σ1 = Σ2 = [0.04, 0.01; 0.01, 0.01]; 3. Mixture of Betas: x 1/3(beta(2, 5)beta(5, 2)+beta(4, 2)beta(2, 4)+beta(1, 3)beta(3, 1)); We simulated 10^5 points for each distribution. ... We test DSP-kmeans on 4 real world datasets of various number of data points and dimensions. Two of them are taken from the UCI machine learning repository [19]; the stem cell data set is taken from the Flow CAP challenges [20]; the mouse bone marrow data set is a recently published single-cell dataset measured using mass cytometry [21]. |
| Dataset Splits | No | The paper describes simulated datasets and real-world datasets but does not specify training, validation, or test splits. For simulated data, it indicates generating N samples without explicit splitting. For real-world datasets, it just states they were used without split details. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions 'All methods are implemented in C++' but does not specify any software libraries or packages with version numbers that would be needed for replication. |
| Experiment Setup | Yes | We set m = 10, θ = 0.01 in our experiments. We found the resulting Hellinger distance to be quite robust as m ranges from 3 to 20 (equally spaced). ... Unless otherwise stated, we use copula transform in our experiments whenever the dimension exceeds 3. |