reproducibilityindex.ai

Graph-Based Semi-Supervised Learning with Non-ignorable Non-response

Authors: Fan Zhou, Tengfei Li, Haibo Zhou, Hongtu Zhu, Ye Jieping

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, simulations and one real data analysis are conducted to evaluate the empirical performance of our proposed methods and a baseline model, which ignores the non-response (SM). Our GNM model reduces to SM when it only contains the outcome model Y \|x given in (3) (or (5)) with the weights in loss (10) being 1 for all samples. In the real data part, GNM is also compared with the model with a misspeciﬁed ignorable missing mechanism, and some other state-of-art de-biasing methods. In the simulation part, we simulate the node response y based on (3) and generate the labelled set by the exponential tilting model (1). For the real data analysis, we evaluate all the compared models by a semi-supervised document classiﬁcation on the citation network-Cora with non-ignorable non-response.
Researcher Affiliation	Collaboration	Fan Zhou1, Tengfei Li2, Haibo Zhou2, Jieping Ye3, Hongtu Zhu3,2 Shanghai University of Finance and Economics1, zhoufan@mail.shufe.edu.cn University of North Carolina at Chapel Hill2, tengfei_li@med.unc.edu,zhou@bios.unc.edu AI Labs, Didi Chuxing 3, {yejieping,zhuhongtu}@didiglobal.com
Pseudocode	Yes	The details of the algorithm are described in ﬁve steps as follows: 1. Determine the initial value of the response probability π(0) i (or θ(0) r ). For example, we can let π(0) i = 1 for all the labelled vertexes (ri = 1). 2. Let e = 1, where e represents the number of epoch. We update θy based on π(0) i obtained from the previous epoch by minimizing the loss function in (10) using GD. At the i-th iteration within the e-th epoch, we update θy as follows: θ(e,i+1) y θ(e,i) y γ0 θy L1(θy\|θ(e 1) r ), (15) where γ0 is the learning rate and L1(θy\|θ(e 1) r ) represents the loss function based on π(e 1) i = πi(yi, h(xi); θ(e 1) r ). We denote the updated θy as θ(e) y after M (e) iterations. 3. Impute yi for all the unlabelled nodes ri = 0 using y(e) i = β(e) 0 + G A(x; θ(e) g )T i β(e) 1 for the continuous case and sampling y(e) i from distribution P(yi\|G A(x)i; θ(e) y ) otherwise. 4. We use GD to update θr. Speciﬁcally, at the j-th iteration, we have θ(e,j+1) r θ(e,j) r γ1 θr f L2(θr\|θ(e) y ) (16) with the initial start θ(e,0) r equal to θ(e 1) r , and γ1 is the learning rate. After convergence, we can get the estimate of θr denoted as θ(e) r at the end of this training epoch. Then we update the sampling weight π(e) i based on P(ri = 1\|yi, h(xi); θ(e) r ) for all labelled vertexes. 5. Stop once convergence has been achieved, otherwise let e = e + 1 and return to step 3. The convergence criterion is that whether the imputed unlabelled vertexes at epoch e only slightly differ from those at epoch (e 1). In other words, the iteration procedure is stopped if X ri=0 \|y(e) i y(e 1) i \|/ X i 1(ri = 0) ε We let M0 and M1 be the maximal number of allowed internal iterations at each epoch for updating θy and θr, respectively. For more details, you can refer to the Algorithm 1 in the supplements.
Open Source Code	Yes	Our implementation of GNM can be found at: https://github.com/BIG-S2/keras-gnm
Open Datasets	Yes	Both A and x are obtained from the Cora dataset. We modify the Cora to a binary-class data by merging the six non Neural Network classes together.
Dataset Splits	No	The paper states that "The labelled subset are randomly split into training and validation sets, while the remaining non-response nodes build the testing set." However, it does not provide specific percentages or absolute counts for these splits, nor does it refer to a standard split with a citation.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU/CPU models or specific machine configurations.
Software Dependencies	No	The paper mentions methods and optimizers like "Adam [9]", "GCN [10]", and "GAT [22]" and refers to "keras-gnm" in the GitHub link, but it does not specify version numbers for any software dependencies, such as programming languages or libraries.
Experiment Setup	Yes	We train all the compared models for a maximum of 200 epochs (E = 200) using Adam [9] with a learning rate 0.05 and make predictions byi for each testing vertex. Training is stopped when validation loss does not decrease in 15 consecutive iterations. We keep all other model settings used by [10] and ﬁx the unit size of the ﬁrst hidden layer to be 16.