reproducibilityindex.ai

Weakly Supervised Neuro-Symbolic Module Networks for Numerical Reasoning over Text

Authors: Amrita Saha, Shafiq Joty, Steven C.H. Hoi11238-11247

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now empirically compare the exact-match performance of WNSMN with So TA baselines on versions of DROP dataset and also examine how it fares in comparison to strong supervised skylines. Table 1 presents our primary results on DROP-num, comparing the performance of WNSMN (accuracy of the top-1 action sampled by the RL agent) with various ablations of NMN (provided in the authors implementation) by removing atleast one of Program, Execution, and Query Attention supervision and Gen BERT models with pretrained BERT that are finetuned on DROP or DROP-num (denoted as Gen BERT and Gen BERT-num).
Researcher Affiliation	Collaboration	Amrita Saha,1 Shafiq Joty, 1,2 Steven C.H. Hoi 1 1 Salesforce Research Asia 2 Nanyang Technological University amrita.saha, sjoty, shoi@salesforce.com
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "For the primary baselines NMN and Gen BERT, we report the performance on in-house trained models on the respective datasets, using the code open-sourced by the authors." This refers to the code of baseline models, not the code developed for WNSMN described in this paper.
Open Datasets	Yes	As the Primary Dataset we use DROP-num, the subset of DROP with numerical answers. This subset contains 45K and 5.8K instances respectively from the standard DROP train and development sets. Originally, NMN was showcased on a very specific subset of DROP, restricted to the 6 reasoning-types it could handle, out of which three (count, date-difference, extract-number) have numeric answers. This subset comprises 20K training and 1.8K development instances, out of which only 10K and 800 instances respectively have numerical answers. (Dua et al. 2019)
Dataset Splits	Yes	In both cases, training data was randomly split into 70%:30% for train and validation and the standard DROP development set was treated as Test.
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as exact GPU or CPU models, memory, or cloud computing specifications.
Software Dependencies	No	The paper mentions software components like "BERT-base-uncased", "Sentence-BERT", and "Adam (Optimizer)" but does not specify version numbers for these or other relevant software dependencies such as programming languages, deep learning frameworks, or libraries.
Experiment Setup	Yes	The hyperparameter settings are as follows: Optimizer: Adam (Learning Rate 1e-4), Num Arguments, Actions Sampled: 50 & 250, Total epochs: 35, Iterative ML for First 15 epochs, Batch size: 2 (Grad. Accumulation 50). L-layered stacked self-attention transformer block (Input dim: 4, Hidden dim: 40, Layers: 3, Heads: 4). L-layered self attention based transformer block (Input dim: 10, Hidden dim: 100, Layers: 3, Heads: 10)