Search and Learn: Improving Semantic Coverage for Data-to-Text Generation

Authors: Shailza Jolly, Zi Xuan Zhang, Andreas Dengel, Lili Mou10858-10866

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our model achieves high performance on E2E and Wiki Bio datasets. Especially, we cover 98.35% of input slots on E2E, largely alleviating the low coverage problem.
Researcher Affiliation Collaboration 1 TU Kaiserslautern, Germany 2 DFKI Gmb H, Germany 3 Dept. Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta, Canada
Pseudocode Yes Algorithm 1: Search and Learn
Open Source Code Yes Our code and output are available at https://github.com/shailzajolly/FSDT
Open Datasets Yes In this experiment, we used the E2E dataset3 (Novikova, Duˇsek, and Rieser 2017), which is a crowdsourced dataset for data-to-text generation... We further evaluate our approach on the Humans domain of Wiki Bio data6 (Lebret, Grangier, and Auli 2016).
Dataset Splits Yes We followed the standard train/val/test split. We validated our approach on 1000 samples and tested it on the standard split.
Hardware Specification Yes Inference time (in seconds) and Relative time were obtained by predicting the test set on a single V100 GPU.
Software Dependencies No The paper mentions models like T5 and GPT-2, and an optimizer (AdamW), but does not provide specific version numbers for any programming languages or software libraries (e.g., Python, PyTorch, TensorFlow, specific NLP libraries) that would be needed for reproducibility.
Experiment Setup Yes We trained the model using the Adam W (Loshchilov and Hutter 2018) optimizer, with an initial learning rate of 3e-4 and a batch size of 64. We used the T5-base model... we use a batch size of 20 during training and accumulate gradients for 3 steps, which results in an actual batch size of 60.