reproducibilityindex.ai

Deep Clustering of Text Representations for Supervision-Free Probing of Syntax

Authors: Vikram Gupta, Haoyue Shi, Kevin Gimpel, Mrinmaya Sachan10720-10728

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report competitive performance of our probe on 45-tag English POSI, state-of-the-art performance on 12-tag POSI across 10 languages, and competitive results on Co Lab. We also perform zero-shot syntax induction on resource impoverished languages and report strong results.
Researcher Affiliation	Collaboration	Vikram Gupta1, Haoyue Shi2, Kevin Gimpel2, Mrinmaya Sachan3 1 Share Chat, India 2 Toyota Technological Institute at Chicago 3 Department of Computer Science, ETH Zurich
Pseudocode	No	The paper provides detailed descriptions of the proposed method and various algorithms within the text and figures, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	We evaluate our approach for POSI on two datasets: 45-tag Penn Treebank Wall Street Journal (WSJ) dataset (Marcus, Santorini, and Marcinkiewicz 1993) and multilingual 12-tag datasets drawn from the universal dependencies project (Nivre et al. 2016).
Dataset Splits	Yes	For POSI, as per the standard practice (Stratos 2019), we use the complete dataset (train + val + test) for training as well as evaluation. However, for Co Lab, we use the train set to train our model and the test set for reporting results, following Drozdov et al. (2019a).
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies	No	The paper describes the software components used (e.g., BERT, fastText, K-Means) and frameworks (e.g., autoencoders, deep clustering), but it does not specify version numbers for any of these components, which is necessary for reproducibility.
Experiment Setup	Yes	When augmenting the m BERT embeddings with morphological features (Synt DEC_Morph)... We concatenate fast Text embeddings of the trailing trigram of each word with contextualized representations before passing them as input to Synt DEC. ...where ν is set to 1 in all experiments. ...Ltotal = LKL + λLrec (1)