reproducibilityindex.ai

ELASTIC: Numerical Reasoning with Adaptive Symbolic Compiler

Authors: Jiaxin Zhang, Yashar Moshfeghi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that ELASTIC achieves 68.96 and 65.21 of execution accuracy and program accuracy on the Fin QA dataset and 83.00 program accuracy on the Math QA dataset, outperforming previous state-of-the-art models significantly.
Researcher Affiliation	Academia	Jiaxin Zhang University of Strathclyde 16 Richmond Street, Glasgow, G1 1XQ jiaxin.zhang@strath.ac.uk Yashar Moshfeghi University of Strathclyde 16 Richmond Street, Glasgow, G1 1XQ yashar.moshfeghi@strath.ac.uk
Pseudocode	No	The paper describes the model architecture and processes but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor does it present structured steps in a code-like format.
Open Source Code	Yes	ELASTIC code can be found at https://github.com/Neura Search/Neur IPS-2022-Submission-3 358
Open Datasets	Yes	We conduct evaluation experiments on two datasets: Fin QA [15] and Math QA [19].
Dataset Splits	Yes	Fin QA: It contains 8,281 data, split into train, eval, and test parts with 6,251, 883, and 1,147 examples. [...] Math QA: The dataset is split into 80%, 12%, and 8% of train, dev, and test data.
Hardware Specification	Yes	The model is implemented by Pytorch [39] and Transformer [40], then trained on a server with an NVIDIA Tesla A100 GPU of 40G memory.
Software Dependencies	No	The model is implemented by Pytorch [39] and Transformer [40]. While the software names are mentioned, specific version numbers for Pytorch or Transformer are not provided.
Experiment Setup	Yes	Training epochs are set to 50 and 100 for Fin QA and Math QA, respectively. The batch size for all datasets is set to 10. We use Adam as optimizer [41] to update the parameters of the models. The initial learning rate is set to 1e-5 equally, and it would be halved in every 25 epochs and 50 epochs for Fin QA and Math QA. During training, the dropout rate and the weight decay are set to 0.1 and 1e-5 to prevent over-fitting. The parameters of the Ro BERTa are fine-tuned during training. For the GRU cell in the decoder, the hidden size is the same as the Ro BERTa, and the GRU layers number is 4. During inference, we use greedy decoding to generate the reasoning program.