reproducibilityindex.ai

Conditional Density Estimation with Histogram Trees

Authors: Lincen Yang, Matthijs van Leeuwen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that, in comparison to existing interpretable CDE methods, CDTrees are both more accurate (as measured by the log-loss) and more robust against irrelevant features.
Researcher Affiliation	Academia	Lincen Yang Matthijs van Leeuwen LIACS, Leiden University Einsteinweg 55, 2333CC Leiden, The Netherlands {l.yang, m.van.leeuwen}@liacs.leidenuniv.nl
Pseudocode	Yes	Algorithm 1 Learn CDTree from data; Algorithm 2 Find the best split for node S; Algorithm 3 Learn the MDL-optimal histogram
Open Source Code	Yes	For reproducibility, we provide further details about implementation and parameter choices in Appendix C. We made our source code public: https://github.com/ylincen/CDTree.
Open Datasets	Yes	We use 14 datasets with numerical target variables from the UCI repository [1]. [1] The uci machine learning repository. URL https://archive.ics.uci.edu/.
Dataset Splits	Yes	all results obtained on the test sets using five-fold cross-validation.
Hardware Specification	Yes	The runtimes reported in Section 6.5 for all algorithms are recorded on the CPU machines with the AMD EPYC 7702 cores.
Software Dependencies	No	The paper mentions several Python packages and libraries used (e.g., 'cde', 'statsmodels', 'scipy', 'scikit-learn') but does not specify their version numbers.
Experiment Setup	Yes	We set C = 5 in our experiments. The step size g. We set g = 30 in searching the number of histogram bins in Algorithm 3. Thus, we set the global range of the histograms based on the range of the full dataset (before train/test split in the cross-validation) plus/minus a small constant, chosen as 10 3. Specifically, we choose the standard deviation of added noise for both the features and the target as 0.01. We further noticed that adding the standard dropout with dropout rate equal to 0.1 gives more stable results.