Graph-Based Semi-Supervised Learning with Non-ignorable Non-response
Authors: Fan Zhou, Tengfei Li, Haibo Zhou, Hongtu Zhu, Ye Jieping
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, simulations and one real data analysis are conducted to evaluate the empirical performance of our proposed methods and a baseline model, which ignores the non-response (SM). Our GNM model reduces to SM when it only contains the outcome model Y |x given in (3) (or (5)) with the weights in loss (10) being 1 for all samples. In the real data part, GNM is also compared with the model with a misspecified ignorable missing mechanism, and some other state-of-art de-biasing methods. In the simulation part, we simulate the node response y based on (3) and generate the labelled set by the exponential tilting model (1). For the real data analysis, we evaluate all the compared models by a semi-supervised document classification on the citation network-Cora with non-ignorable non-response. |
| Researcher Affiliation | Collaboration | Fan Zhou1, Tengfei Li2, Haibo Zhou2, Jieping Ye3, Hongtu Zhu3,2 Shanghai University of Finance and Economics1, zhoufan@mail.shufe.edu.cn University of North Carolina at Chapel Hill2, tengfei_li@med.unc.edu,zhou@bios.unc.edu AI Labs, Didi Chuxing 3, {yejieping,zhuhongtu}@didiglobal.com |
| Pseudocode | Yes | The details of the algorithm are described in five steps as follows: 1. Determine the initial value of the response probability π(0) i (or θ(0) r ). For example, we can let π(0) i = 1 for all the labelled vertexes (ri = 1). 2. Let e = 1, where e represents the number of epoch. We update θy based on π(0) i obtained from the previous epoch by minimizing the loss function in (10) using GD. At the i-th iteration within the e-th epoch, we update θy as follows: θ(e,i+1) y θ(e,i) y γ0 θy L1(θy|θ(e 1) r ), (15) where γ0 is the learning rate and L1(θy|θ(e 1) r ) represents the loss function based on π(e 1) i = πi(yi, h(xi); θ(e 1) r ). We denote the updated θy as θ(e) y after M (e) iterations. 3. Impute yi for all the unlabelled nodes ri = 0 using y(e) i = β(e) 0 + G A(x; θ(e) g )T i β(e) 1 for the continuous case and sampling y(e) i from distribution P(yi|G A(x)i; θ(e) y ) otherwise. 4. We use GD to update θr. Specifically, at the j-th iteration, we have θ(e,j+1) r θ(e,j) r γ1 θr f L2(θr|θ(e) y ) (16) with the initial start θ(e,0) r equal to θ(e 1) r , and γ1 is the learning rate. After convergence, we can get the estimate of θr denoted as θ(e) r at the end of this training epoch. Then we update the sampling weight π(e) i based on P(ri = 1|yi, h(xi); θ(e) r ) for all labelled vertexes. 5. Stop once convergence has been achieved, otherwise let e = e + 1 and return to step 3. The convergence criterion is that whether the imputed unlabelled vertexes at epoch e only slightly differ from those at epoch (e 1). In other words, the iteration procedure is stopped if X ri=0 |y(e) i y(e 1) i |/ X i 1(ri = 0) ε We let M0 and M1 be the maximal number of allowed internal iterations at each epoch for updating θy and θr, respectively. For more details, you can refer to the Algorithm 1 in the supplements. |
| Open Source Code | Yes | Our implementation of GNM can be found at: https://github.com/BIG-S2/keras-gnm |
| Open Datasets | Yes | Both A and x are obtained from the Cora dataset. We modify the Cora to a binary-class data by merging the six non Neural Network classes together. |
| Dataset Splits | No | The paper states that "The labelled subset are randomly split into training and validation sets, while the remaining non-response nodes build the testing set." However, it does not provide specific percentages or absolute counts for these splits, nor does it refer to a standard split with a citation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or specific machine configurations. |
| Software Dependencies | No | The paper mentions methods and optimizers like "Adam [9]", "GCN [10]", and "GAT [22]" and refers to "keras-gnm" in the GitHub link, but it does not specify version numbers for any software dependencies, such as programming languages or libraries. |
| Experiment Setup | Yes | We train all the compared models for a maximum of 200 epochs (E = 200) using Adam [9] with a learning rate 0.05 and make predictions byi for each testing vertex. Training is stopped when validation loss does not decrease in 15 consecutive iterations. We keep all other model settings used by [10] and fix the unit size of the first hidden layer to be 16. |