reproducibilityindex.ai

Search and Learn: Improving Semantic Coverage for Data-to-Text Generation

Authors: Shailza Jolly, Zi Xuan Zhang, Andreas Dengel, Lili Mou10858-10866

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our model achieves high performance on E2E and Wiki Bio datasets. Especially, we cover 98.35% of input slots on E2E, largely alleviating the low coverage problem.
Researcher Affiliation	Collaboration	1 TU Kaiserslautern, Germany 2 DFKI Gmb H, Germany 3 Dept. Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta, Canada
Pseudocode	Yes	Algorithm 1: Search and Learn
Open Source Code	Yes	Our code and output are available at https://github.com/shailzajolly/FSDT
Open Datasets	Yes	In this experiment, we used the E2E dataset3 (Novikova, Duˇsek, and Rieser 2017), which is a crowdsourced dataset for data-to-text generation... We further evaluate our approach on the Humans domain of Wiki Bio data6 (Lebret, Grangier, and Auli 2016).
Dataset Splits	Yes	We followed the standard train/val/test split. We validated our approach on 1000 samples and tested it on the standard split.
Hardware Specification	Yes	Inference time (in seconds) and Relative time were obtained by predicting the test set on a single V100 GPU.
Software Dependencies	No	The paper mentions models like T5 and GPT-2, and an optimizer (AdamW), but does not provide specific version numbers for any programming languages or software libraries (e.g., Python, PyTorch, TensorFlow, specific NLP libraries) that would be needed for reproducibility.
Experiment Setup	Yes	We trained the model using the Adam W (Loshchilov and Hutter 2018) optimizer, with an initial learning rate of 3e-4 and a batch size of 64. We used the T5-base model... we use a batch size of 20 during training and accumulate gradients for 3 steps, which results in an actual batch size of 60.