reproducibilityindex.ai

Learning Hyper Label Model for Programmatic Weak Supervision

Authors: Renzhi Wu, Shen-En Chen, Jieyu Zhang, Xu Chu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On 14 real-world datasets, our hyper label model outperforms the best existing methods in both accuracy (by 1.4 points on average) and efﬁciency (by six times on average).
Researcher Affiliation	Academia	Renzhi Wu1, Shen-En Chen1, Jieyu Zhang2, Xu Chu1 1Georgia Tech 2University of Washington
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. Figure 1 is a diagram illustrating the network architecture.
Open Source Code	Yes	Our code is available at https://github.com/wurenzhi/hyper label model
Open Datasets	Yes	Datasets. We use all 14 classiﬁcation datasets in a recent weak supervision benchmark (Zhang et al., 2021) that are from diverse domains (e.g. income/sentiment/spam/relation/question/topic classiﬁcation tasks). We highlight these datasets are only used for evaluation after our model is trained on synthetically generated data, and we never used these datasets during training. Table 1 shows the statistics of all datasets. We also use the metrics provided by the benchmark (Zhang et al., 2021) for each dataset (as different datasets need different metrics depending on their application background). All LFs are from the original authors of each dataset and are hosted in the benchmark project (Zhang, 2022a).
Dataset Splits	Yes	We use a synthetic validation set D to select the best run out of ten runs. The validation set is generated with a different method from a prior work (Zhang et al., 2021)... We synthetically generate the validation set D with size \|D \| = 100 according to the generation method proposed in (Zhang et al., 2021);
Hardware Specification	Yes	Hardware. All of our experiments were performed on a machine with a 2.20GHz Intel Xeon(R) Gold 5120 CPU, a K80 GPU and with 96GB 2666MHz RAM.
Software Dependencies	No	The paper mentions software like Pytorch (Paszke et al., 2019) and scikit-learn (rfs, 2022), but it does not provide specific version numbers for these or other key software components, which is required for reproducibility.
Experiment Setup	Yes	We use the Adam optimizer (Kingma & Ba, 2014). We set amsgrad to be true for better convergence (Reddi et al., 2019) and keep all other parameters as the default values provided by Pytorch (e.g. learning rate lr = 0.001). We use a batch size of 50... Table 7: Hyper-parameters and search space for the end models. (Provides batch size, lr, weight decay, ffn num layer, ffn hidden size for MLP and BERT)