Weighted Graph Clustering with Non-Uniform Uncertainties

Authors: Yudong Chen, Shiau Hong Lim, Huan Xu

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide simulation results that validate our theoretical findings. The results demonstrate that the weighted approach and the optimal weights outperform other methods, and non-uniform resource allocation lead to performance gain. To empirically evaluate our theoretic findings in Section 3, we follow Chen et al. (2012) to adapt the Augmented Lagrangian Multiplier algorithm by Lin et al. (2009) to solve our weighted program (2) (5). In our simulations, we repeat each test 100 times and report the success rate, which is the fraction of attempts where the true clustering is recovered. Error bars show 95% confidence interval. In all our experiments, we choose to report results from parameter regions where the problem is neither too hard nor too easy to solve. We note that qualitatively the results are similar across a wide range of distributions.
Researcher Affiliation Academia Yudong Chen YUDONG.CHEN@EECS.BERKELEY.EDU University of California, Berkeley. Berkeley, CA 94720, USA Shiau Hong Lim MPELSH@NUS.EDU.SG National University of Singapore, Singapore 117575 Huan Xu MPEXUH@NUS.EDU.SG National University of Singapore, Singapore 117575
Pseudocode No The paper describes adapting an existing algorithm (
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets No The paper uses a stochastic model to generate graph data for simulations, as opposed to using a publicly available dataset with concrete access information. 'The ground truth consists of n nodes divided into 4 equal-size clusters. We tested with n = 200 and n = 1000. The following stochastic model is used: for each pair of nodes, with probability q it is unobserved (i.e., Pij = 0.5), otherwise the uncertainties vary uniformly at random between 0.26 and 0.26+ . The graph is then generated according to the model in Section 2.1 with error probability Pij.'
Dataset Splits No The paper describes generating synthetic data and evaluating success rates, but it does not define or mention specific training, validation, or test dataset splits in the conventional machine learning sense (e.g., percentages or counts for distinct sets).
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU models, CPU types, memory).
Software Dependencies No The paper mentions adapting an algorithm by Lin et al. (2009) but does not provide specific software dependencies with version numbers (e.g., libraries, frameworks, or solvers with their versions).
Experiment Setup Yes In our simulations, we repeat each test 100 times and report the success rate, which is the fraction of attempts where the true clustering is recovered. The ground truth consists of n nodes divided into 4 equal-size clusters. We tested with n = 200 and n = 1000. For n = 200 we use q = 0.6 while for n = 1000 we use q = 0.7 since the problem is easier for larger n. For step weights, we split the range (0.26 , 0.26+ ) to 3 equal intervals (0.26 , 0.26 3 ) and (0.26+ 3 , 0.26+ ). All Pij in the same interval receives the same weight, which is is based on the MLE weight of the largest Pij in the interval. The distribution of Pij is such that 20% of the pairs are unobserved, and among the rest, Pij is either 0.1 or 0.4, each with 0.5 probability. We test our theory by comparing three different weighting schemes for graph clustering with non-uniform uncertainties. In particular, we compare the weighted formulation with the MLE weights with the unweighted formulation. The third candidate is a step weighting schemes where the different observation uncertainties are quantized into 3 levels, each with a different weight, with higher weights on the more certain observations. In particular, we may replace the constraint k Y k n by a regularization term in the objective function and solve the following program: min Y 2Rn n k Y k + λ Bij|Aij Yij| s.t. 0 Yij 1, 8i, j, where λ is chosen to be small enough (usually around 1/pn) such that the solution satisfies k Y k n.