Ensemble Distillation for Unsupervised Constituency Parsing
Authors: Behzad Shayegh, Yanshuai Cao, Xiaodan Zhu, Jackie CK Cheung, Lili Mou
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our method surpasses all previous approaches, consistently demonstrating its effectiveness and robustness across various runs, with different ensemble components, and under domain-shift conditions. |
| Researcher Affiliation | Collaboration | Behzad Shayegh1, Yanshuai Cao2 Xiaodan Zhu3,4 Jackie C.K. Cheung5,6 Lili Mou1,6 1Dept. Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta 2Borealis AI 3Dept. Electrical and Computer Engineering, Queen s University 4Ingenuity Labs Research Institute, Queen s University 5Quebec Artificial Intelligence Institute (MILA), Mc Gill University 6Canada CIFAR AI Chair |
| Pseudocode | Yes | In Appendix A, we summarize our ensemble procedure in pseudocode and provide an illustration. Algorithm 1 Our CYK Variant |
| Open Source Code | Yes | Code available at https://github.com/MANGA-UOFA/ED4UCP |
| Open Datasets | Yes | We evaluated our approach on the widely used Penn Treebank (PTB; Marcus et al., 1993) dataset, following most previous work (Shen et al., 2019; Kim et al., 2019a; Cao et al., 2020; Maveli & Cohen, 2022; Li & Lu, 2023). In addition, we used the SUSANNE dataset (Sampson, 2002) to evaluate model performance in a domain-shift setting. |
| Dataset Splits | Yes | We adopted the standard split: 39,701 samples in Sections 02 21 for training, 1,690 samples in Section 22 for validation, and 2,412 samples in Section 23 for test. |
| Hardware Specification | Yes | We measured the run time using 28 Intel(R) Core(TM) i9-9940X (@3.30GHz) CPUs with or without GPU (Nvidia RTX Titan). |
| Software Dependencies | No | The paper states 'For hyperparameters and other setups of previous methods (all teacher and student models), we used default values mentioned in either papers or codebases.' and lists various teacher models with their respective codebases (e.g., 'https://github.com/harvardnlp/compound-pcfg'). While these codebases imply certain software stacks, the paper itself does not explicitly list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) required to reproduce the experiments for its own method. |
| Experiment Setup | No | The paper states 'For hyperparameters and other setups of previous methods (all teacher and student models), we used default values mentioned in either papers or codebases. It should be emphasized that our proposed ensemble approach does not have any hyperparameters, thus not requiring any tuning.' This indicates that their *own* ensemble method does not require tuning, but it defers to external sources for the setup of the teacher and student models used in their experiments, without explicitly detailing those specific hyperparameter values or system-level training settings within the paper itself. |