SpreadsheetCoder: Formula Prediction from Semi-structured Context

Authors: Xinyun Chen, Petros Maniatis, Rishabh Singh, Charles Sutton, Hanjun Dai, Max Lin, Denny Zhou

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For evaluation (Section 4), we construct a large-scale benchmark of spreadsheets, and demonstrate that SPREADSHEETCODER achieves top-1 prediction accuracy of 42.51%
Researcher Affiliation Collaboration Xinyun Chen 1 Petros Maniatis 2 Rishabh Singh 2 Charles Sutton 2 Hanjun Dai 2 Max Lin 2 Denny Zhou 2 1UC Berkeley 2Google.
Pseudocode No The paper describes the model architecture and decoding process but does not include any pseudocode or algorithm blocks.
Open Source Code Yes The code and data are available at https://github. com/google-research/google-research/tree/ master/spreadsheet_coder.
Open Datasets Yes The code and data are available at https://github. com/google-research/google-research/tree/ master/spreadsheet_coder.
Dataset Splits Yes We collected 46K Google Sheets with formulas, and split them into 42K for training, 2.3K for validation, and 1.7K for testing.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running experiments.
Software Dependencies No The paper mentions using BERT and LSTM but does not provide specific software version numbers for libraries, frameworks, or programming languages used in implementation.
Experiment Setup Yes We include data values in cells that are at most D rows and D columns away from the target cell, so that the input dimension is (2D + 2) (2D + 1), and we set D = 10 in our experiments. ...we tile our rows into bundles of N = 3 adjacent data rows, plus the header row... The number of data rows N = 3 is set to seek the balance between the size of the tabular context fed into the encoder and the computational efficiency. ...we can feed at most L = 512/(N + 1) tokens per row. To generate formulas referring to cells within D = 10 rows and columns, L = 128 is a good fit in our evaluation. ...we set the beam size to be 64 for all settings.