Variance Reduction in Bipartite Experiments through Correlation Clustering
Authors: Jean Pouget-Abadie, Kevin Aydin, Warren Schudy, Kay Brodersen, Vahab Mirrokni
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use a publicly-available graph of Amazon user-item reviews to validate our solution and illustrate how it substantially increases the statistical power in bipartite experiments. In Section 4, using a publicly-available Amazon user-item review dataset, we show that our suggested algorithm improves experimental power significantly over other more straightforward extensions of cluster randomized designs to the bipartite setting. In Figure 1.a, we report the standard deviation of the observed treatment exposure vector... In Figure 1.b, we evaluate experimental power for each clustering after hyperparameter optimization for all baselines by assuming a model of potential outcomes. |
| Researcher Affiliation | Industry | Jean Pouget-Abadie Google Research New York, NY 10011 jeanpa@google.com Kevin Aydin Google Research Mountain View, CA 94043 kaydin@google.com Warren Schudy Google Research New York, NY 10011 wschudy@google.com Kay Brodersen Google Z urich, Switzerland kbrodersen@google.com Vahab Mirrokni Google Research New York, NY 10011 mirrokni@google.com |
| Pseudocode | No | The paper describes an algorithm in narrative text ('To produce a clustering of diversion units, our algorithm proceeds in two steps...'), but it does not present it in a formally labeled 'Pseudocode' or 'Algorithm' block with structured steps. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it include links to a code repository. |
| Open Datasets | Yes | To evaluate our clustering algorithm, we choose a publicly-available bipartite graph dataset (Mc Auley et al., 2015; He and Mc Auley, 2016), where each edge in the dataset corresponds to a user review of a product on Amazon, totaling 83M reviews made by 121k users on 9.8M items in the graph. |
| Dataset Splits | No | The paper mentions 'hyper-parameter optimization for all baselines' but does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or testing for the Amazon user-item review graph data. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory configurations. It only mentions running a 'simulation study'. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., programming languages with library versions, or specific solvers with their versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | we simulate a randomized experiment on this cohort of user-items by randomly assigning 10% of item-clusters to receive a simulated treatment. We compute each users exposure ei to treatment as the proportion of treated items they have reviewed historically... we suppose the following linear outcome model: Yi ei + σϵi, where ϵi N(0, 1). |