Neighborhood-Regularized Self-Training for Learning with Few Labels
Authors: Ran Xu, Yue Yu, Hejie Cui, Xuan Kan, Yanqiao Zhu, Joyce Ho, Chao Zhang, Carl Yang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on eight tasks show that our proposed method outperforms the strongest self-training baseline with 1.83% and 2.51% performance gain for text and graph datasets on average. Our further analysis demonstrates that our proposed data selection strategy reduces the noise of pseudo labels by 36.8% and saves 57.3% of the time when compared with the best baseline. |
| Researcher Affiliation | Academia | 1 Emory University, 2 Georgia Institute of Technology, 3 University of California, Los Angeles |
| Pseudocode | Yes | Algorithm 1: Procedures of Self-training. |
| Open Source Code | Yes | Our code and appendices will be uploaded to https://github.com/ritaranx/Ne ST. |
| Open Datasets | Yes | We conduct experiments for semi-supervised learning on eight datasets to demonstrate the efficacy of Ne ST. Four of them are text-related tasks... The other four are graph-based tasks, where we choose molecular property prediction as the main task and use pre-trained Grover-base (Rong et al. 2020) as the backbone. ...We conduct experiments on four widely used datasets from the Molecule Net (Wu et al. 2018), including BBBP (Martins et al. 2012), BACE (Subramanian et al. 2016), Esol (Delaney 2004) and Lipophilicity (Gaulton et al. 2012). |
| Dataset Splits | Yes | For each dataset, we train our method and baselines with different numbers of labeled data from {30, 50, 100} per class. The remaining in the training set is considered as unlabeled data. As suggested by Bragg et al. (2021), we keep the size of the validation set to be the same as the number of labeled data to simulate the realistic setting. |
| Hardware Specification | No | This research was partially supported by the internal funds and GPU servers provided by the Computer Science Department of Emory University. No specific GPU model names, CPU models, or other detailed hardware specifications are provided beyond 'GPU servers'. |
| Software Dependencies | No | We employ the pre-trained BERT from the Hugging Face (Wolf et al. 2019) codebase for the implementation. The other four are graph-based tasks, where we choose molecular property prediction as the main task and use pre-trained Grover-base (Rong et al. 2020) as the backbone. We use Bio BERT (Lee et al. 2020) as the backbone for Chemprot... and use Ro BERTa-base for other datasets. We use Adam (Kingma and Ba 2014) as the optimizer... efficiently supported via FAISS (Johnson, Douze, and J egou 2021). Specific version numbers for these software components (e.g., PyTorch, TensorFlow, Hugging Face Transformers library, CUDA) are not provided. |
| Experiment Setup | Yes | Parameter Settings. We use Adam (Kingma and Ba 2014) as the optimizer and tune the learning rate in {1e5, 2e-5, 5e-5}. The batch size is selected from {8, 16, 32}. Other hyperparameters in Ne ST include T, T1, γ for selftraining, β, b, k for sample selection in Eq. 7, and λ in Eq. 5. We set β = 0.1, γ = 0.9, λ = 0.5, m = 0.6, T = 5, T1 = 1000 for all datasets, and tune b = c|Xl| with c {3, 5, 10, 20} for text datasets and c {1, 3, 5} for graph datasets. |