CoLink: An Unsupervised Framework for User Identity Linkage
Authors: Zexuan Zhong, Yong Cao, Yong Cao, Mu Guo, Mu Guo, Zaiqing Nie, Zaiqing Nie
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply Co Link to a UIL task of mapping the employees in an enterprise network to their Linked In profiles. The experiment results show that Co Link generally outperforms the state-of-the-art unsupervised approaches by an F1 increase over 20%. |
| Researcher Affiliation | Collaboration | Zexuan Zhong, Yong Cao, Mu Guo, Zaiqing Nie Microsoft Research, China University of Illinois at Urbana-Champaign, USA Alibaba AI Labs, China zexuan2@illinois.edu, {yongc, muguo}@microsoft.com, zaiqing.nzq@alibaba-inc.com |
| Pseudocode | Yes | Algorithm 1: The co-training algorithm in Co Link. |
| Open Source Code | No | The paper does not include any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | We choose a real-world data set to evaluate Co Link, in which one social network is Linked In, while the other network is an internal enterprise user network. We crawl over 2.4 million public Linked In profiles from the web. The Linked In profile webpages are parsed to get attributes like name, organization, job title, location etc. The enterprise user network contains over 220K users. The relationships between enterprise users are obtained from the enterprise s Active Directory. |
| Dataset Splits | No | The paper mentions "training data" and "seed sets" but does not provide specific percentages or counts for training, validation, or test splits, nor does it refer to a standard split. |
| Hardware Specification | Yes | On average, it takes about 30 minutes to get the model trained on 100K attribute pairs with a Tesla K40 GPU. |
| Software Dependencies | No | The paper mentions specific algorithms and functions like "deep LSTM", "sequence-to-sequence", "SVM", "Jaccard similarity", and "softmax function" but does not provide any specific software names with version numbers. |
| Experiment Setup | Yes | The encoder deep LSTM and decoder deep LSTM both have 2 LSTM stacked, because we find that a encoder or decoder with more than 2 layers won t bring performance increase any more for the UIL task. In each LSTM, the recurrent unit has a size of 512. Every word is first turned into a 512 embedding vector before being fed into the encoder and decoder. In our task, we choose a threshold of 95%. |