Conditional Generative Model Based Predicate-Aware Query Approximation
Authors: Nikhil Sheoran, Subrata Mitra, Vibhor Porwal, Siddharth Ghetia, Jatin Varshney, Tung Mai, Anup Rao, Vikas Maddukuri8259-8266
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluations with four different baselines on three real-world datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines. |
| Researcher Affiliation | Collaboration | 1 University of Illinois at Urbana-Champaign 2 Adobe Research 3 Indian Institute of Technology, Roorkee |
| Pseudocode | Yes | Algorithm 1: Stratified Masking Strategy |
| Open Source Code | No | The paper refers to the code for baselines (VAEAC and NARU) but does not provide a specific link or explicit statement about the open-source release of the code for their own proposed methodology (ELECTRA). |
| Open Datasets | Yes | We use three real-world datasets: Flights (Bureau of Transportation Statistics), Housing (Qiu 2018) and Beijing PM2.5 (Chen 2017). |
| Dataset Splits | No | The paper describes how queries were generated for evaluation and mentions training, but it does not specify explicit training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | Yes | All the experiments were performed on a 32 core Intel(R) Xeon(R) CPU E5-2686 with 4 Tesla V100-SXM2 GPU(s). |
| Software Dependencies | No | The paper mentions software like PyTorch, NARU's implementation, and sklearn's Bayesian Gaussian Mixture method, but it does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We varied the depth (d) of the prior and proposal networks in the range [2,4,6,8] and the latent dimension (L) in the range [32,64,128,256]. For Flights data we use d = 8, L = 64, for Housing d = 8, L = 64, and for Beijing PM2.5 we use d = 6, L = 32. Note that, the depth of the networks and the latent dimension contribute significantly to the model size. Hence, depending on the size constraints (if any), one can choose a simpler model. We used a masking factor (r) of 0.5. The model was trained with an Adam Optimizer with a learning rate of 0.0001 (larger learning rates gave unstable variational lower bound(s)). Selectivity Estimator. We use NARU s publicly available implementation 2. The model is trained with the Res MADE architecture with a batch size of 512, an initial warm-up of 10000 rounds, with 5 layers each of hidden dimension 256. |