reproducibilityindex.ai

SpreadsheetCoder: Formula Prediction from Semi-structured Context

Authors: Xinyun Chen, Petros Maniatis, Rishabh Singh, Charles Sutton, Hanjun Dai, Max Lin, Denny Zhou

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For evaluation (Section 4), we construct a large-scale benchmark of spreadsheets, and demonstrate that SPREADSHEETCODER achieves top-1 prediction accuracy of 42.51%
Researcher Affiliation	Collaboration	Xinyun Chen 1 Petros Maniatis 2 Rishabh Singh 2 Charles Sutton 2 Hanjun Dai 2 Max Lin 2 Denny Zhou 2 1UC Berkeley 2Google.
Pseudocode	No	The paper describes the model architecture and decoding process but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	The code and data are available at https://github. com/google-research/google-research/tree/ master/spreadsheet_coder.
Open Datasets	Yes	The code and data are available at https://github. com/google-research/google-research/tree/ master/spreadsheet_coder.
Dataset Splits	Yes	We collected 46K Google Sheets with formulas, and split them into 42K for training, 2.3K for validation, and 1.7K for testing.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for running experiments.
Software Dependencies	No	The paper mentions using BERT and LSTM but does not provide specific software version numbers for libraries, frameworks, or programming languages used in implementation.
Experiment Setup	Yes	We include data values in cells that are at most D rows and D columns away from the target cell, so that the input dimension is (2D + 2) (2D + 1), and we set D = 10 in our experiments. ...we tile our rows into bundles of N = 3 adjacent data rows, plus the header row... The number of data rows N = 3 is set to seek the balance between the size of the tabular context fed into the encoder and the computational efﬁciency. ...we can feed at most L = 512/(N + 1) tokens per row. To generate formulas referring to cells within D = 10 rows and columns, L = 128 is a good ﬁt in our evaluation. ...we set the beam size to be 64 for all settings.