reproducibilityindex.ai

Moment Distributionally Robust Tree Structured Prediction

Authors: Yeshu Li, Danyal Saeed, Xinhua Zhang, Brian Ziebart, Kevin Gimpel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate its empirical effectiveness on dependency parsing benchmarks.
Researcher Affiliation	Academia	Yeshu Li Danyal Saeed Xinhua Zhang Brian D. Ziebart Department of Computer Science University of Illinois at Chicago {yli299, dsaeed3, zhangx, bziebart}@uic.edu Kevin Gimpel Toyota Technological Institute at Chicago kgimpel@ttic.edu
Pseudocode	No	The paper describes algorithms such as "double oracle" and "ADMM" verbally and references existing algorithms, but it does not provide any explicitly labeled pseudocode blocks or algorithms.
Open Source Code	Yes	Our code is publicly available at https://github.com/Daniel Leee/drtreesp.
Open Datasets	Yes	We adopt three public datasets, the English Penn Treebank (PTB v3.0) [Marcus et al., 1993], the Penn Chinese Treebank (CTB v5.1) [Xue et al., 2002] and the Universal Dependencies (UD v2.3) [Nivre et al., 2016].
Dataset Splits	Yes	in each run, we randomly draw m {10, 50, 100, 1000} samples without replacement from the training set and keep the original validation and test sets. The optimal hyperparameters and parameters are chosen based on the validation set.
Hardware Specification	Yes	All experiments are conducted on a computer with an Intel Core i7 CPU (2.7 GHz) and an NVIDIA Tesla P100 GPU (16 GB).
Software Dependencies	No	The paper states: "We implement our methods in Python and C2. We leverage the implementations in Su Par3 [Zhang et al., 2020] for the baseline." However, it does not provide specific version numbers for Python, C, or Su Par, which are required for a reproducible description of software dependencies.
Experiment Setup	Yes	The optimal hyperparameters and parameters are chosen based on the validation set. For fair comparisons, all the models are run with CPU only, with a batch size of 200. All the methods achieve their optimal validation set performance in 150-300 steps. We conduct sensitivity analysis by varying µ and λ on UD Dutch with 100 training samples.