Conditional Generative Model Based Predicate-Aware Query Approximation

Authors: Nikhil Sheoran, Subrata Mitra, Vibhor Porwal, Siddharth Ghetia, Jatin Varshney, Tung Mai, Anup Rao, Vikas Maddukuri8259-8266

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluations with four different baselines on three real-world datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines.
Researcher Affiliation Collaboration 1 University of Illinois at Urbana-Champaign 2 Adobe Research 3 Indian Institute of Technology, Roorkee
Pseudocode Yes Algorithm 1: Stratified Masking Strategy
Open Source Code No The paper refers to the code for baselines (VAEAC and NARU) but does not provide a specific link or explicit statement about the open-source release of the code for their own proposed methodology (ELECTRA).
Open Datasets Yes We use three real-world datasets: Flights (Bureau of Transportation Statistics), Housing (Qiu 2018) and Beijing PM2.5 (Chen 2017).
Dataset Splits No The paper describes how queries were generated for evaluation and mentions training, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification Yes All the experiments were performed on a 32 core Intel(R) Xeon(R) CPU E5-2686 with 4 Tesla V100-SXM2 GPU(s).
Software Dependencies No The paper mentions software like PyTorch, NARU's implementation, and sklearn's Bayesian Gaussian Mixture method, but it does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We varied the depth (d) of the prior and proposal networks in the range [2,4,6,8] and the latent dimension (L) in the range [32,64,128,256]. For Flights data we use d = 8, L = 64, for Housing d = 8, L = 64, and for Beijing PM2.5 we use d = 6, L = 32. Note that, the depth of the networks and the latent dimension contribute significantly to the model size. Hence, depending on the size constraints (if any), one can choose a simpler model. We used a masking factor (r) of 0.5. The model was trained with an Adam Optimizer with a learning rate of 0.0001 (larger learning rates gave unstable variational lower bound(s)). Selectivity Estimator. We use NARU s publicly available implementation 2. The model is trained with the Res MADE architecture with a batch size of 512, an initial warm-up of 10000 rounds, with 5 layers each of hidden dimension 256.