Hierarchical and Density-based Causal Clustering

Authors: Kwangho Kim, Jisu Kim, Larry Wasserman, Edward Kennedy

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We explore finite sample properties via simulation, and illustrate the proposed methods in voting and employment projection datasets.
Researcher Affiliation Academia Kwangho Kim Korea University kwanghk@korea.ac.kr Jisu Kim Seoul National University jkim82133@snu.ac.kr Larry A. Wasserman Carnegie Mellon University larry@stat.cmu.edu Edward H. Kennedy Carnegie Mellon University edward@stat.cmu.edu
Pseudocode No The paper refers to external algorithms (e.g., 'Algorithm 2 in Balcan et al. [5]') but does not provide its own pseudocode or algorithm blocks.
Open Source Code No We plan to release a quick tutorial code on Github shortly.
Open Datasets Yes We explore finite sample properties via simulation, and illustrate the proposed methods in voting and employment projection datasets. ... Nie and Wager [46] considered a dataset on the voting study originally used by Arceneaux et al. [2]. ... The dataset, obtained from the US Bureau of Labor Statistics (BLS), provides projected employment by occupation.
Dataset Splits No We randomly chose a training set of size 13000 and a test set of size 10000 from the entire sample.
Hardware Specification No No specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments are mentioned in the paper.
Software Dependencies No We use the cross-validation-based Super Learner ensemble [54] to combine regression splines, support vector machine regression, and random forests on the training sample, and perform the density-based causal clustering on the test sample using De Ba Cl function in TDA R package [18].
Experiment Setup Yes Letting n = 2500, we randomly pick 10 points in a bounded hypercube [0, 1]3: {c 1, ..., c 10}, and assign roughly n/10 points following truncated normal distribution to each Voronoi cell associated with c j; these are our {µ(i)}. Next, we let bµa = µa + ξ with ξ N(0, n β). ... We randomly chose a training set of size 13000 and a test set of size 10000 from the entire sample. Then we estimate {bµ(i)} using the cross-validation-based Super Learner ensemble [54] to combine regression splines, support vector machine regression, and random forests on the training sample ... Next, letting h = 0.01, we compute eph and bph, and the corresponding level sets Lh,t and b Lh,t for different values of t.