reproducibilityindex.ai

CoLink: An Unsupervised Framework for User Identity Linkage

Authors: Zexuan Zhong, Yong Cao, Yong Cao, Mu Guo, Mu Guo, Zaiqing Nie, Zaiqing Nie

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply Co Link to a UIL task of mapping the employees in an enterprise network to their Linked In proﬁles. The experiment results show that Co Link generally outperforms the state-of-the-art unsupervised approaches by an F1 increase over 20%.
Researcher Affiliation	Collaboration	Zexuan Zhong, Yong Cao, Mu Guo, Zaiqing Nie Microsoft Research, China University of Illinois at Urbana-Champaign, USA Alibaba AI Labs, China zexuan2@illinois.edu, {yongc, muguo}@microsoft.com, zaiqing.nzq@alibaba-inc.com
Pseudocode	Yes	Algorithm 1: The co-training algorithm in Co Link.
Open Source Code	No	The paper does not include any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	No	We choose a real-world data set to evaluate Co Link, in which one social network is Linked In, while the other network is an internal enterprise user network. We crawl over 2.4 million public Linked In proﬁles from the web. The Linked In proﬁle webpages are parsed to get attributes like name, organization, job title, location etc. The enterprise user network contains over 220K users. The relationships between enterprise users are obtained from the enterprise s Active Directory.
Dataset Splits	No	The paper mentions "training data" and "seed sets" but does not provide specific percentages or counts for training, validation, or test splits, nor does it refer to a standard split.
Hardware Specification	Yes	On average, it takes about 30 minutes to get the model trained on 100K attribute pairs with a Tesla K40 GPU.
Software Dependencies	No	The paper mentions specific algorithms and functions like "deep LSTM", "sequence-to-sequence", "SVM", "Jaccard similarity", and "softmax function" but does not provide any specific software names with version numbers.
Experiment Setup	Yes	The encoder deep LSTM and decoder deep LSTM both have 2 LSTM stacked, because we ﬁnd that a encoder or decoder with more than 2 layers won t bring performance increase any more for the UIL task. In each LSTM, the recurrent unit has a size of 512. Every word is ﬁrst turned into a 512 embedding vector before being fed into the encoder and decoder. In our task, we choose a threshold of 95%.