reproducibilityindex.ai

Conditional Generative Model Based Predicate-Aware Query Approximation

Authors: Nikhil Sheoran, Subrata Mitra, Vibhor Porwal, Siddharth Ghetia, Jatin Varshney, Tung Mai, Anup Rao, Vikas Maddukuri8259-8266

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluations with four different baselines on three real-world datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines.
Researcher Affiliation	Collaboration	1 University of Illinois at Urbana-Champaign 2 Adobe Research 3 Indian Institute of Technology, Roorkee
Pseudocode	Yes	Algorithm 1: Stratiﬁed Masking Strategy
Open Source Code	No	The paper refers to the code for baselines (VAEAC and NARU) but does not provide a specific link or explicit statement about the open-source release of the code for their own proposed methodology (ELECTRA).
Open Datasets	Yes	We use three real-world datasets: Flights (Bureau of Transportation Statistics), Housing (Qiu 2018) and Beijing PM2.5 (Chen 2017).
Dataset Splits	No	The paper describes how queries were generated for evaluation and mentions training, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	Yes	All the experiments were performed on a 32 core Intel(R) Xeon(R) CPU E5-2686 with 4 Tesla V100-SXM2 GPU(s).
Software Dependencies	No	The paper mentions software like PyTorch, NARU's implementation, and sklearn's Bayesian Gaussian Mixture method, but it does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We varied the depth (d) of the prior and proposal networks in the range [2,4,6,8] and the latent dimension (L) in the range [32,64,128,256]. For Flights data we use d = 8, L = 64, for Housing d = 8, L = 64, and for Beijing PM2.5 we use d = 6, L = 32. Note that, the depth of the networks and the latent dimension contribute signiﬁcantly to the model size. Hence, depending on the size constraints (if any), one can choose a simpler model. We used a masking factor (r) of 0.5. The model was trained with an Adam Optimizer with a learning rate of 0.0001 (larger learning rates gave unstable variational lower bound(s)). Selectivity Estimator. We use NARU s publicly available implementation 2. The model is trained with the Res MADE architecture with a batch size of 512, an initial warm-up of 10000 rounds, with 5 layers each of hidden dimension 256.