reproducibilityindex.ai

Hierarchical and Density-based Causal Clustering

Authors: Kwangho Kim, Jisu Kim, Larry Wasserman, Edward Kennedy

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We explore finite sample properties via simulation, and illustrate the proposed methods in voting and employment projection datasets.
Researcher Affiliation	Academia	Kwangho Kim Korea University kwanghk@korea.ac.kr Jisu Kim Seoul National University jkim82133@snu.ac.kr Larry A. Wasserman Carnegie Mellon University larry@stat.cmu.edu Edward H. Kennedy Carnegie Mellon University edward@stat.cmu.edu
Pseudocode	No	The paper refers to external algorithms (e.g., 'Algorithm 2 in Balcan et al. [5]') but does not provide its own pseudocode or algorithm blocks.
Open Source Code	No	We plan to release a quick tutorial code on Github shortly.
Open Datasets	Yes	We explore finite sample properties via simulation, and illustrate the proposed methods in voting and employment projection datasets. ... Nie and Wager [46] considered a dataset on the voting study originally used by Arceneaux et al. [2]. ... The dataset, obtained from the US Bureau of Labor Statistics (BLS), provides projected employment by occupation.
Dataset Splits	No	We randomly chose a training set of size 13000 and a test set of size 10000 from the entire sample.
Hardware Specification	No	No specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments are mentioned in the paper.
Software Dependencies	No	We use the cross-validation-based Super Learner ensemble [54] to combine regression splines, support vector machine regression, and random forests on the training sample, and perform the density-based causal clustering on the test sample using De Ba Cl function in TDA R package [18].
Experiment Setup	Yes	Letting n = 2500, we randomly pick 10 points in a bounded hypercube [0, 1]3: {c 1, ..., c 10}, and assign roughly n/10 points following truncated normal distribution to each Voronoi cell associated with c j; these are our {µ(i)}. Next, we let bµa = µa + ξ with ξ N(0, n β). ... We randomly chose a training set of size 13000 and a test set of size 10000 from the entire sample. Then we estimate {bµ(i)} using the cross-validation-based Super Learner ensemble [54] to combine regression splines, support vector machine regression, and random forests on the training sample ... Next, letting h = 0.01, we compute eph and bph, and the corresponding level sets Lh,t and b Lh,t for different values of t.