reproducibilityindex.ai

Universalizing Weak Supervision

Authors: Changho Shin, Winfred Li, Harit Vishwakarma, Nicholas Carl Roberts, Frederic Sala

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we validate our framework and show improvement over baselines in diverse settings including real-world learning-to-rank and regression problems along with learning on hyperbolic manifolds. Experimentally, we demonstrate our approach on ﬁve choices of problems never before tackled in WS: Learning rankings: on two real-world rankings tasks, our approach with as few as ﬁve sources performs better than supervised learning with a smaller number of true labels. In contrast, an adaptation of the Snorkel (Ratner et al., 2018) framework cannot reach this performance with as many as 18 sources. Regression: on two real-world regression datasets, when using 6 or more labeling function, the performance of our approach is comparable to fully-supervised models. Learning in hyperbolic spaces: on a geodesic regression task in hyperbolic space, we consistently outperform fully-supervised learning, even when using only 3 labeling functions (LFs). Estimation in generic metric spaces: in a synthetic setting of metric spaces induced by random graphs, we demonstrate that our method handles LF heterogeneity better than the majority vote baseline. Learning parse trees: in semantic dependency parsing, we outperform strong baseline models.
Researcher Affiliation	Academia	Changho Shin, Winfred Li, Harit Vishwakarma, Nicholas Roberts, Frederic Sala Department of Computer Sciences University of Wisconsin-Madison {cshin23, wli525, hvishwakarma, ncroberts2, fsala}@wisc.edu
Pseudocode	Yes	Algorithm 1: Universal Label Model Learning; Algorithm 2: CONTINUOUSTRIPLETS; Algorithm 3: Isotropic Gaussian Label Model Learning; Algorithm 4: QUADRATICTRIPLETS
Open Source Code	No	The paper does not contain an explicit statement about the release of source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	For our movies dataset, we combined IMDb, TMDb, Rotten Tomatoes, and Movie Lens movie review data to obtain features and weak labels. We used real-world datasets compatible with multiple label types, including a movies dataset and the Board Game Geek dataset (2017) (BGG), along with synthetic data. We used datasets on Czech and English taken from the Universal Dependencies Nivre et al. (2020) repository. MSLR-WEB10K. https://www.microsoft.com/en-us/research/project/mslr/. Imdb movie dataset. https://www.imdb.com/interfaces/. Tmdb 5k movie dataset version 2. https://www.kaggle.com/tmdb/tmdb-movie-metadata. Board Game Geek Reviews Version 2. https://www.kaggle.com/jvanelteren/boardgamegeek-reviews, 2017.
Dataset Splits	No	The paper specifies a 'training set' and 'test set' split (e.g., '75% for training set, and 25% for the test set' or '5000 sets of movies as the training set, and 1000 sets of movies as the test set'), but does not explicitly mention a separate validation set or its proportion/size.
Hardware Specification	Yes	All experiments were conducted on a machine with Intel Broadwell 2.7GHz CPU and NVIDIA GK210 GPU.
Software Dependencies	No	The paper mentions software components like 'SGD optimizer', 'List MLE loss', and 'gradient boosting regression implemented in sklearn' but does not provide specific version numbers for these software libraries or frameworks.
Experiment Setup	Yes	In the ranking setup, we used 4-layer MLP with Re LU activations. Each hidden layer had 30 units and batch normalization(Ioffe & Szegedy, 2015) was applied for all hidden layers. We used the SGD optimizer with List MLE loss (Xia et al., 2008); the learning rate was 0.01. In the regression experiments, we used gradient boosting regression implemented in sklearn with n estimators=250. Other than n estimators, we used the default hyperparameters in sklearn s implementation.