Moment Distributionally Robust Tree Structured Prediction

Authors: Yeshu Li, Danyal Saeed, Xinhua Zhang, Brian Ziebart, Kevin Gimpel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate its empirical effectiveness on dependency parsing benchmarks.
Researcher Affiliation Academia Yeshu Li Danyal Saeed Xinhua Zhang Brian D. Ziebart Department of Computer Science University of Illinois at Chicago {yli299, dsaeed3, zhangx, bziebart}@uic.edu Kevin Gimpel Toyota Technological Institute at Chicago kgimpel@ttic.edu
Pseudocode No The paper describes algorithms such as "double oracle" and "ADMM" verbally and references existing algorithms, but it does not provide any explicitly labeled pseudocode blocks or algorithms.
Open Source Code Yes Our code is publicly available at https://github.com/Daniel Leee/drtreesp.
Open Datasets Yes We adopt three public datasets, the English Penn Treebank (PTB v3.0) [Marcus et al., 1993], the Penn Chinese Treebank (CTB v5.1) [Xue et al., 2002] and the Universal Dependencies (UD v2.3) [Nivre et al., 2016].
Dataset Splits Yes in each run, we randomly draw m {10, 50, 100, 1000} samples without replacement from the training set and keep the original validation and test sets. The optimal hyperparameters and parameters are chosen based on the validation set.
Hardware Specification Yes All experiments are conducted on a computer with an Intel Core i7 CPU (2.7 GHz) and an NVIDIA Tesla P100 GPU (16 GB).
Software Dependencies No The paper states: "We implement our methods in Python and C2. We leverage the implementations in Su Par3 [Zhang et al., 2020] for the baseline." However, it does not provide specific version numbers for Python, C, or Su Par, which are required for a reproducible description of software dependencies.
Experiment Setup Yes The optimal hyperparameters and parameters are chosen based on the validation set. For fair comparisons, all the models are run with CPU only, with a batch size of 200. All the methods achieve their optimal validation set performance in 150-300 steps. We conduct sensitivity analysis by varying µ and λ on UD Dutch with 100 training samples.