Conditional Density Estimation with Histogram Trees
Authors: Lincen Yang, Matthijs van Leeuwen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that, in comparison to existing interpretable CDE methods, CDTrees are both more accurate (as measured by the log-loss) and more robust against irrelevant features. |
| Researcher Affiliation | Academia | Lincen Yang Matthijs van Leeuwen LIACS, Leiden University Einsteinweg 55, 2333CC Leiden, The Netherlands {l.yang, m.van.leeuwen}@liacs.leidenuniv.nl |
| Pseudocode | Yes | Algorithm 1 Learn CDTree from data; Algorithm 2 Find the best split for node S; Algorithm 3 Learn the MDL-optimal histogram |
| Open Source Code | Yes | For reproducibility, we provide further details about implementation and parameter choices in Appendix C. We made our source code public: https://github.com/ylincen/CDTree. |
| Open Datasets | Yes | We use 14 datasets with numerical target variables from the UCI repository [1]. [1] The uci machine learning repository. URL https://archive.ics.uci.edu/. |
| Dataset Splits | Yes | all results obtained on the test sets using five-fold cross-validation. |
| Hardware Specification | Yes | The runtimes reported in Section 6.5 for all algorithms are recorded on the CPU machines with the AMD EPYC 7702 cores. |
| Software Dependencies | No | The paper mentions several Python packages and libraries used (e.g., 'cde', 'statsmodels', 'scipy', 'scikit-learn') but does not specify their version numbers. |
| Experiment Setup | Yes | We set C = 5 in our experiments. The step size g. We set g = 30 in searching the number of histogram bins in Algorithm 3. Thus, we set the global range of the histograms based on the range of the full dataset (before train/test split in the cross-validation) plus/minus a small constant, chosen as 10 3. Specifically, we choose the standard deviation of added noise for both the features and the target as 0.01. We further noticed that adding the standard dropout with dropout rate equal to 0.1 gives more stable results. |