Marginal Distribution Adaptation for Discrete Sets via Module-Oriented Divergence Minimization
Authors: Hanjun Dai, Mengjiao Yang, Yuan Xue, Dale Schuurmans, Bo Dai
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we will first validate the correctness of our marginal adaptation framework on synthetic datasets for all the three models in Section 5.1. Then in Section 5.2 we study the effectiveness of the framework in adapting the learned distribution to the target distribution via marginal alignment using real-world datasets. We present the experiment configurations for model architectures, training and evaluation methods used in both sections. |
| Researcher Affiliation | Industry | 1Google Research, Brain Team 2Google Cloud. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing its source code for the methodology described, nor does it include any links to a code repository. |
| Open Datasets | Yes | MIMIC3: We curate this dataset based on the encounter ICD9 diagnosis codes from MIMIC-III (Johnson et al., 2016), an open source EHR dataset. Instacart (ins): This dataset comes from the Kaggle Instacart Market Basket Analysis competition. We select the top 1,000 popular products for generation and control experiments. |
| Dataset Splits | Yes | Without timing information, we randomly split the Groceries dataset into Dsrc and Dtgt with ratio 9:1. For Instacart, we use its own prior set as Dsrc and train as Dtgt. For all the others with timing information, we sort the datasets according to the timestamp and then use the first 90% as Dsrc and rest 10% as Dtgt. |
| Hardware Specification | Yes | By default we train all the base models p and adapted models q on a single Nvidia V100 GPU with batch size 128, using Adam optimizer. |
| Software Dependencies | No | The paper mentions software components like "Adam optimizer", "PCD framework", and "GWG-sampler", but it does not specify version numbers for these software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | Model configuration: We present the default model configurations here unless later specified. LVM: ... MLP with 2 hidden layers of 512 Re LU activated neurons... Autoregressive: We use Transformers... 4 layers with 8 heads... dimensions for embedding and feed-forward layers are 256 and 512... EBM: We use an MLP with 2 hidden layers of 512 Re LU activated neurons for f used in p. Training configuration: By default we train all the base models p and adapted models q on a single Nvidia V100 GPU with batch size 128, using Adam optimizer. For EBMs training we leverage the PCD framework... We use GWG-sampler... The number of MCMC steps per gradient update varies within {50, 100, 200}. |