Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization

Authors: Hanjun Dai, Mengjiao Yang, Yuan Xue, Dale Schuurmans, Bo Dai

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we will first validate the correctness of our marginal adaptation framework on synthetic datasets for all the three models in Section 5.1. Then in Section 5.2 we study the effectiveness of the framework in adapting the learned distribution to the target distribution via marginal alignment using real-world datasets. We present the experiment configurations for model architectures, training and evaluation methods used in both sections.
Researcher Affiliation Industry 1Google Research, Brain Team 2Google Cloud.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing its source code for the methodology described, nor does it include any links to a code repository.
Open Datasets Yes MIMIC3: We curate this dataset based on the encounter ICD9 diagnosis codes from MIMIC-III (Johnson et al., 2016), an open source EHR dataset. Instacart (ins): This dataset comes from the Kaggle Instacart Market Basket Analysis competition. We select the top 1,000 popular products for generation and control experiments.
Dataset Splits Yes Without timing information, we randomly split the Groceries dataset into Dsrc and Dtgt with ratio 9:1. For Instacart, we use its own prior set as Dsrc and train as Dtgt. For all the others with timing information, we sort the datasets according to the timestamp and then use the first 90% as Dsrc and rest 10% as Dtgt.
Hardware Specification Yes By default we train all the base models p and adapted models q on a single Nvidia V100 GPU with batch size 128, using Adam optimizer.
Software Dependencies No The paper mentions software components like "Adam optimizer", "PCD framework", and "GWG-sampler", but it does not specify version numbers for these software dependencies, which are necessary for full reproducibility.
Experiment Setup Yes Model configuration: We present the default model configurations here unless later specified. LVM: ... MLP with 2 hidden layers of 512 Re LU activated neurons... Autoregressive: We use Transformers... 4 layers with 8 heads... dimensions for embedding and feed-forward layers are 256 and 512... EBM: We use an MLP with 2 hidden layers of 512 Re LU activated neurons for f used in p. Training configuration: By default we train all the base models p and adapted models q on a single Nvidia V100 GPU with batch size 128, using Adam optimizer. For EBMs training we leverage the PCD framework... We use GWG-sampler... The number of MCMC steps per gradient update varies within {50, 100, 200}.