Moment Distributionally Robust Tree Structured Prediction
Authors: Yeshu Li, Danyal Saeed, Xinhua Zhang, Brian Ziebart, Kevin Gimpel
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate its empirical effectiveness on dependency parsing benchmarks. |
| Researcher Affiliation | Academia | Yeshu Li Danyal Saeed Xinhua Zhang Brian D. Ziebart Department of Computer Science University of Illinois at Chicago {yli299, dsaeed3, zhangx, bziebart}@uic.edu Kevin Gimpel Toyota Technological Institute at Chicago kgimpel@ttic.edu |
| Pseudocode | No | The paper describes algorithms such as "double oracle" and "ADMM" verbally and references existing algorithms, but it does not provide any explicitly labeled pseudocode blocks or algorithms. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/Daniel Leee/drtreesp. |
| Open Datasets | Yes | We adopt three public datasets, the English Penn Treebank (PTB v3.0) [Marcus et al., 1993], the Penn Chinese Treebank (CTB v5.1) [Xue et al., 2002] and the Universal Dependencies (UD v2.3) [Nivre et al., 2016]. |
| Dataset Splits | Yes | in each run, we randomly draw m {10, 50, 100, 1000} samples without replacement from the training set and keep the original validation and test sets. The optimal hyperparameters and parameters are chosen based on the validation set. |
| Hardware Specification | Yes | All experiments are conducted on a computer with an Intel Core i7 CPU (2.7 GHz) and an NVIDIA Tesla P100 GPU (16 GB). |
| Software Dependencies | No | The paper states: "We implement our methods in Python and C2. We leverage the implementations in Su Par3 [Zhang et al., 2020] for the baseline." However, it does not provide specific version numbers for Python, C, or Su Par, which are required for a reproducible description of software dependencies. |
| Experiment Setup | Yes | The optimal hyperparameters and parameters are chosen based on the validation set. For fair comparisons, all the models are run with CPU only, with a batch size of 200. All the methods achieve their optimal validation set performance in 150-300 steps. We conduct sensitivity analysis by varying µ and λ on UD Dutch with 100 training samples. |