Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Density Estimation via Discrepancy Based Adaptive Sequential Partition
Authors: Dangna Li, Kun Yang, Wing Hung Wong
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate its efficiency as a density estimation method. We also show how it can be utilized to find good initializations for k-means. |
| Researcher Affiliation | Collaboration | Dangna Li ICME, Stanford University Stanford, CA 94305 EMAIL Kun Yang Google Mountain View, CA 94043 EMAIL Wing Hung Wong Department of Statistics Stanford University Stanford, CA 94305 EMAIL |
| Pseudocode | Yes | The pseudocode for the complete algorithm is given in Algorithm 1. |
| Open Source Code | No | The paper does not provide any links or explicit statements about the availability of open-source code for the methodology. |
| Open Datasets | Yes | 1) To demonstrate the method and visualize the results, we apply it on several 2-dimensional data sets simulated from 3 distributions with different geometry: 1. Gaussian: x N(µ, Σ)1{x [0, 1]2}, with µ = (.5, .5)T , Σ = [0.08, 0.02; 0.02, 0.02] 2. Mixture of Gaussians: x 1/2 P2 i=1 N(µi, Σi)1{x [0, 1]2} with µ1 = (.50, .25)T , and µ2 = (.50, .75)T , Σ1 = Σ2 = [0.04, 0.01; 0.01, 0.01]; 3. Mixture of Betas: x 1/3(beta(2, 5)beta(5, 2)+beta(4, 2)beta(2, 4)+beta(1, 3)beta(3, 1)); We simulated 10^5 points for each distribution. ... We test DSP-kmeans on 4 real world datasets of various number of data points and dimensions. Two of them are taken from the UCI machine learning repository [19]; the stem cell data set is taken from the Flow CAP challenges [20]; the mouse bone marrow data set is a recently published single-cell dataset measured using mass cytometry [21]. |
| Dataset Splits | No | The paper describes simulated datasets and real-world datasets but does not specify training, validation, or test splits. For simulated data, it indicates generating N samples without explicit splitting. For real-world datasets, it just states they were used without split details. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions 'All methods are implemented in C++' but does not specify any software libraries or packages with version numbers that would be needed for replication. |
| Experiment Setup | Yes | We set m = 10, θ = 0.01 in our experiments. We found the resulting Hellinger distance to be quite robust as m ranges from 3 to 20 (equally spaced). ... Unless otherwise stated, we use copula transform in our experiments whenever the dimension exceeds 3. |