CoLink: An Unsupervised Framework for User Identity Linkage

Authors: Zexuan Zhong, Yong Cao, Yong Cao, Mu Guo, Mu Guo, Zaiqing Nie, Zaiqing Nie

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply Co Link to a UIL task of mapping the employees in an enterprise network to their Linked In profiles. The experiment results show that Co Link generally outperforms the state-of-the-art unsupervised approaches by an F1 increase over 20%.
Researcher Affiliation Collaboration Zexuan Zhong, Yong Cao, Mu Guo, Zaiqing Nie Microsoft Research, China University of Illinois at Urbana-Champaign, USA Alibaba AI Labs, China zexuan2@illinois.edu, {yongc, muguo}@microsoft.com, zaiqing.nzq@alibaba-inc.com
Pseudocode Yes Algorithm 1: The co-training algorithm in Co Link.
Open Source Code No The paper does not include any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets No We choose a real-world data set to evaluate Co Link, in which one social network is Linked In, while the other network is an internal enterprise user network. We crawl over 2.4 million public Linked In profiles from the web. The Linked In profile webpages are parsed to get attributes like name, organization, job title, location etc. The enterprise user network contains over 220K users. The relationships between enterprise users are obtained from the enterprise s Active Directory.
Dataset Splits No The paper mentions "training data" and "seed sets" but does not provide specific percentages or counts for training, validation, or test splits, nor does it refer to a standard split.
Hardware Specification Yes On average, it takes about 30 minutes to get the model trained on 100K attribute pairs with a Tesla K40 GPU.
Software Dependencies No The paper mentions specific algorithms and functions like "deep LSTM", "sequence-to-sequence", "SVM", "Jaccard similarity", and "softmax function" but does not provide any specific software names with version numbers.
Experiment Setup Yes The encoder deep LSTM and decoder deep LSTM both have 2 LSTM stacked, because we find that a encoder or decoder with more than 2 layers won t bring performance increase any more for the UIL task. In each LSTM, the recurrent unit has a size of 512. Every word is first turned into a 512 embedding vector before being fed into the encoder and decoder. In our task, we choose a threshold of 95%.