Variable Skipping for Autoregressive Range Density Estimation

Authors: Eric Liang, Zongheng Yang, Ion Stoica, Pieter Abbeel, Yan Duan, Peter Chen

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Evaluation Our evaluation investigates the following questions: 1. How much does variable skipping improve estimation accuracy compared to baselines, and how is this impacted by the sampling budget? 2. Can variable skipping be combined with multi-order training to further improve accuracy? 3. To what extent do hyperparameters such as the model capacity and mask token distribution impact the effectiveness of variable skipping? 4. Can variable skipping be applied to related domains such as text, or is it limited to tabular data?
Researcher Affiliation Collaboration Eric Liang* 1 Zongheng Yang* 1 Ion Stoica 1 Pieter Abbeel 1 2 Yan Duan 2 Xi Chen 2 *Equal contribution 1EECS, UC Berkeley, Berkeley, California, USA 2covariant.ai, Berkeley, California, USA.
Pseudocode No The paper describes methods and processes but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes 4. To invite research on this under-explored problem, we open source our code and a set of range density estimation benchmarks on high-dimensional discrete datasets at https://var-skip.github.io.
Open Datasets Yes We use the following public datasets in our evaluation, also summarized in Table 1. DMV-FULL (State of New York, 2019). [...] KDD (Dua & Graff, 2017). [...] CENSUS (Dua & Graff, 2017). [...] DRYAD-URLS (Sen et al., 2016).
Dataset Splits No The paper describes training models and evaluating them with queries, but it does not specify explicit training/validation/test dataset splits with percentages, counts, or a detailed splitting methodology for reproducibility.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments.
Software Dependencies No The paper mentions architectural choices (Res MADE, Transformer) and optimizer (Adam), but it does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions) needed for replication.
Experiment Setup Yes Table 2. Hyperparameters for all experiments. We used a Res MADE for tabular data, and a Transformer for text. Hyperparameter Value: Training Epochs 20 (200 for KDD), Batch Size 2048, Architecture Res MADE, Residual Blocks 3, Hidden Layers / Block 2, Hidden Layer Units 256, Embedding Size 32, Optimizer Adam, Learning Rate 5e-4, Learning Rate Warmup 1 epoch, Mask Probability Uniform[0, 1).