reproducibilityindex.ai

Ensemble Distillation for Unsupervised Constituency Parsing

Authors: Behzad Shayegh, Yanshuai Cao, Xiaodan Zhu, Jackie CK Cheung, Lili Mou

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our method surpasses all previous approaches, consistently demonstrating its effectiveness and robustness across various runs, with different ensemble components, and under domain-shift conditions.
Researcher Affiliation	Collaboration	Behzad Shayegh1, Yanshuai Cao2 Xiaodan Zhu3,4 Jackie C.K. Cheung5,6 Lili Mou1,6 1Dept. Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta 2Borealis AI 3Dept. Electrical and Computer Engineering, Queen s University 4Ingenuity Labs Research Institute, Queen s University 5Quebec Artificial Intelligence Institute (MILA), Mc Gill University 6Canada CIFAR AI Chair
Pseudocode	Yes	In Appendix A, we summarize our ensemble procedure in pseudocode and provide an illustration. Algorithm 1 Our CYK Variant
Open Source Code	Yes	Code available at https://github.com/MANGA-UOFA/ED4UCP
Open Datasets	Yes	We evaluated our approach on the widely used Penn Treebank (PTB; Marcus et al., 1993) dataset, following most previous work (Shen et al., 2019; Kim et al., 2019a; Cao et al., 2020; Maveli & Cohen, 2022; Li & Lu, 2023). In addition, we used the SUSANNE dataset (Sampson, 2002) to evaluate model performance in a domain-shift setting.
Dataset Splits	Yes	We adopted the standard split: 39,701 samples in Sections 02 21 for training, 1,690 samples in Section 22 for validation, and 2,412 samples in Section 23 for test.
Hardware Specification	Yes	We measured the run time using 28 Intel(R) Core(TM) i9-9940X (@3.30GHz) CPUs with or without GPU (Nvidia RTX Titan).
Software Dependencies	No	The paper states 'For hyperparameters and other setups of previous methods (all teacher and student models), we used default values mentioned in either papers or codebases.' and lists various teacher models with their respective codebases (e.g., 'https://github.com/harvardnlp/compound-pcfg'). While these codebases imply certain software stacks, the paper itself does not explicitly list any specific software dependencies or their version numbers (e.g., Python, PyTorch, TensorFlow versions) required to reproduce the experiments for its own method.
Experiment Setup	No	The paper states 'For hyperparameters and other setups of previous methods (all teacher and student models), we used default values mentioned in either papers or codebases. It should be emphasized that our proposed ensemble approach does not have any hyperparameters, thus not requiring any tuning.' This indicates that their own ensemble method does not require tuning, but it defers to external sources for the setup of the teacher and student models used in their experiments, without explicitly detailing those specific hyperparameter values or system-level training settings within the paper itself.