Ensemble Distillation for Unsupervised Constituency Parsing

Authors: Behzad Shayegh, Yanshuai Cao, Xiaodan Zhu, Jackie CK Cheung, Lili Mou

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our method surpasses all previous approaches, consistently demonstrating its effectiveness and robustness across various runs, with different ensemble components, and under domain-shift conditions.
Researcher Affiliation Collaboration Behzad Shayegh1, Yanshuai Cao2 Xiaodan Zhu3,4 Jackie C.K. Cheung5,6 Lili Mou1,6 1Dept. Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta 2Borealis AI 3Dept. Electrical and Computer Engineering, Queen s University 4Ingenuity Labs Research Institute, Queen s University 5Quebec Artificial Intelligence Institute (MILA), Mc Gill University 6Canada CIFAR AI Chair
Pseudocode Yes In Appendix A, we summarize our ensemble procedure in pseudocode and provide an illustration. Algorithm 1 Our CYK Variant
Open Source Code Yes Code available at https://github.com/MANGA-UOFA/ED4UCP
Open Datasets Yes We evaluated our approach on the widely used Penn Treebank (PTB; Marcus et al., 1993) dataset, following most previous work (Shen et al., 2019; Kim et al., 2019a; Cao et al., 2020; Maveli & Cohen, 2022; Li & Lu, 2023). In addition, we used the SUSANNE dataset (Sampson, 2002) to evaluate model performance in a domain-shift setting.
Dataset Splits Yes We adopted the standard split: 39,701 samples in Sections 02 21 for training, 1,690 samples in Section 22 for validation, and 2,412 samples in Section 23 for test.
Hardware Specification Yes We measured the run time using 28 Intel(R) Core(TM) i9-9940X (@3.30GHz) CPUs with or without GPU (Nvidia RTX Titan).
Software Dependencies No The paper states 'For hyperparameters and other setups of previous methods (all teacher and student models), we used default values mentioned in either papers or codebases.' and lists various teacher models with their respective codebases (e.g., 'https://github.com/harvardnlp/compound-pcfg'). While these codebases imply certain software stacks, the paper itself does not explicitly list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) required to reproduce the experiments for its own method.
Experiment Setup No The paper states 'For hyperparameters and other setups of previous methods (all teacher and student models), we used default values mentioned in either papers or codebases. It should be emphasized that our proposed ensemble approach does not have any hyperparameters, thus not requiring any tuning.' This indicates that their *own* ensemble method does not require tuning, but it defers to external sources for the setup of the teacher and student models used in their experiments, without explicitly detailing those specific hyperparameter values or system-level training settings within the paper itself.