SpreadsheetCoder: Formula Prediction from Semi-structured Context
Authors: Xinyun Chen, Petros Maniatis, Rishabh Singh, Charles Sutton, Hanjun Dai, Max Lin, Denny Zhou
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For evaluation (Section 4), we construct a large-scale benchmark of spreadsheets, and demonstrate that SPREADSHEETCODER achieves top-1 prediction accuracy of 42.51% |
| Researcher Affiliation | Collaboration | Xinyun Chen 1 Petros Maniatis 2 Rishabh Singh 2 Charles Sutton 2 Hanjun Dai 2 Max Lin 2 Denny Zhou 2 1UC Berkeley 2Google. |
| Pseudocode | No | The paper describes the model architecture and decoding process but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and data are available at https://github. com/google-research/google-research/tree/ master/spreadsheet_coder. |
| Open Datasets | Yes | The code and data are available at https://github. com/google-research/google-research/tree/ master/spreadsheet_coder. |
| Dataset Splits | Yes | We collected 46K Google Sheets with formulas, and split them into 42K for training, 2.3K for validation, and 1.7K for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running experiments. |
| Software Dependencies | No | The paper mentions using BERT and LSTM but does not provide specific software version numbers for libraries, frameworks, or programming languages used in implementation. |
| Experiment Setup | Yes | We include data values in cells that are at most D rows and D columns away from the target cell, so that the input dimension is (2D + 2) (2D + 1), and we set D = 10 in our experiments. ...we tile our rows into bundles of N = 3 adjacent data rows, plus the header row... The number of data rows N = 3 is set to seek the balance between the size of the tabular context fed into the encoder and the computational efficiency. ...we can feed at most L = 512/(N + 1) tokens per row. To generate formulas referring to cells within D = 10 rows and columns, L = 128 is a good fit in our evaluation. ...we set the beam size to be 64 for all settings. |