Table2Analysis: Modeling and Recommendation of Common Analysis Patterns for Multi-Dimensional Data

Authors: Mengyu Zhou, Wang Tao, Ji Pengxin, Han Shi, Zhang Dongmei320-328

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Table2Analysis has 0.78 recall at top-5 and 0.65 recall at top-1 in our evaluation against a large scale spreadsheet corpus on the Pivot Table recommendation task. ... We collect over 121,000 tables from 74,000 real world Excel files. ... We verify the effectiveness of Table2Analysis on Pivot Table recommendation.
Researcher Affiliation Collaboration Mengyu Zhou,1 Tao Wang,2 Pengxin Ji,2 Shi Han,1 Dongmei Zhang1 1Microsoft Research 2Beijing University of Posts and Telecommunications Beijing, China {mezho, shihan, dongmeiz}@microsoft.com, {wangt, jpx}@bupt.edu.cn
Pseudocode No The paper describes its processes and model architecture in detail, including figures (Figure 2 for DQN Model Architecture), but it does not present any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any specific link to a code repository, nor does it explicitly state that the source code for Table2Analysis is open-sourced or available in supplementary materials.
Open Datasets No The paper states, "Our Pivot Table corpus contains 74, 299 unique English Excel spreadsheet files (with Pivot Tables) that are crawled from public Web." While the data source is public, the collected corpus itself is not provided with an explicit public access link, DOI, or citation to a repository for the corpus itself. It's their own collected dataset based on public files, but the dataset itself is not made public.
Dataset Splits Yes The Excel files are allocated for training, validation, and testing in the ratio of 7 : 1 : 2. Hyper-parameter experiments are based on the validation set, while overall accuracy and effectiveness comparisons against the baseline methods use the testing set.
Hardware Specification Yes Multiple experiments for tuning hyper-parameters and testing are run on the Azure Cloud using Standard NCv3 (24 CPUs, 448 GB memory, 4 NVIDIA Tesla V100 16G-memory GPUs) VM nodes.
Software Dependencies No The paper mentions using the "BERT model (Devlin et al. 2018)" for semantic embedding, but it does not specify a version number for BERT or any other software dependencies crucial for replication.
Experiment Setup Yes The small one (with 0.81M parameters) where N = 4, h = 8, dff = 192 and dmodel = 96; The large one ( 4.60M parameters) where N = 6, h = 12, dff = 384 and dmodel = 192. Four class weight settings of the negative log-likelihood loss function: (1, 1), (0.8, 1), (0.2, 1) and (0.08, 1) for the zero and one action value classes from q (s, a). Other hyper-parameters such as dropout rate (= 0.1) and epoch rounds (= 30) are derived from the above ones or fixed for all the pretrain experiments. ... (Expand Limit, Beam Size) = (100, 4). For OU noises, we set θ = 0.15, σ = 0.2, μ = 0, and diminish the scale of noise as the epoch number grows (scaling factor starts at 0.9, and ends at 0.001 by multiplying 0.8 after each epoch).