InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation
Authors: Pierre Jean A. Colombo, ChloƩ Clavel, Pablo Piantanida10554-10562
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using direct assessment, we demonstrate that Info LM achieves statistically significant improvement and over 10 points of correlation gains in many configurations on both summarization and data2text generation. |
| Researcher Affiliation | Academia | 1Laboratoire des Signaux et Syst emes (L2S), Centrale Supelec CNRS Universite Paris-Saclay 2T el ecom Paris Tech, Universit e Paris Saclay pierre.colombo@centralesupelec.fr chloe.clavel@telecom-paris.fr pablo.piantanida@centralesupelec.fr |
| Pseudocode | No | The paper describes the method using text and mathematical formulas but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing open-source code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | To compare the different metrics previous work (Bhandari et al. 2020) either relies on the TAC datasets (Dang and Owczarzak 2008; Mc Namee and Dang 2009) or on new summarization datasets extracted from CNN/Daily Mail (Nallapati et al. 2016). As pointed out by Peyrard (2019); Bhandari et al. (2020), TAC datasets are old and contain flaws (e.g systems used to generate summaries were of poor quality), we choose to work with the newly assemble dataset from CNN/Daily News proposed in Bhandari et al. (2020). This dataset gathers 11,490 summaries and annotations are carried using the pyramid method (Nenkova and Passonneau 2004). [...] we instead rely on a different dataset coming from the Web NLG2020 challenge (Gardent et al. 2017). |
| Dataset Splits | No | The paper mentions using the CNN/Daily News dataset and Web NLG2020 challenge dataset for evaluation but does not specify exact training, validation, or test splits (e.g., percentages, sample counts, or citations to predefined splits). |
| Hardware Specification | No | The paper states 'This work was also granted access to the HPC resources of IDRIS under the allocation 2021-AP010611665 as well as under the project 2021-101838 made by GENCI.', which refers to general HPC resources but does not provide specific hardware details like GPU/CPU models or memory. |
| Software Dependencies | No | The paper mentions using pre-trained models like BERT but does not provide specific version numbers for software dependencies or libraries used for implementation (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | No | The paper mentions 'temperature scaling' as a calibration technique and that 'parameters are optimized for each criterion', but it does not provide specific hyperparameter values (e.g., learning rate, batch size, epochs, specific optimizer settings) or detailed system-level training configurations. |