k-CoRating: Filling Up Data to Obtain Privacy and Utility
Authors: Feng Zhang, Victor Lee, Ruoming Jin
AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that such model could greatly reduce risks of being subject to Narayanan s attacks. Though it seems that k-co Rating is similar to kanonymity (Samarati 2001), k-co Rating is designed with three major differences from k-anonymity. The first is that kco Rating handles with such dataset (typically the user-item ratings) that has a large number of attributes, any subset of which behaves in the sense of quasi-identifier arised from k-anonymity. The second is that k-co Rated privacy rests in that each record, with at least k-1 ones, has and only has non-null values with respect to the same subset of attributes but they don t necessarily get the identical value under each attribute as k-anonymity requires. And the third is privacy of k-co Rating is achieved through filling up necessary NULL cells with significant values, but not the generalization and suppression techniques. All claims are verified by experimental results. |
| Researcher Affiliation | Academia | Feng Zhang1, Victor E. Lee2, and Ruoming Jin3 1School of Computer Science, China University of Geosciences, Wuhan, Hubei, China 2Department of Mathematics and Computer Science, John Carroll University, University Heights, OH, USA 3Department of Computer Science , Kent State University, Kent, OH, USA |
| Pseudocode | Yes | Algorithm 1: sub-Ge Com: k-co Rating an Already Sorted Matrix M2; Algorithm 2: Ge Com: Generate k-co Rated Matrix M; Algorithm 3: Pa Ge Com: A Parallel Algorithm to Generate k-co Rated Matrix M |
| Open Source Code | No | The paper does not contain any explicit statement about releasing the source code for its methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Experiments have been done using four popular benchmark datasets: two Movielens datasets2, one Epinions dataset3 and one Netflix prize dataset4. 2http://www.grouplens.org/node/73 3http://www.epinions.com/ 4http://www.netflixprize.com |
| Dataset Splits | Yes | For the Movie Lens 100K dataset, we used the prepared 80%/20% splits of the dataset, i.e., u1.base and u1.test through u5.base and u5.test, to do the 5-fold cross-validation experiments; and for other datasets, we used the 10-fold cross-validation method to evaluate the prediction accuracy. |
| Hardware Specification | Yes | The implementation was conducted on a laptop of an Intel Core i7-2640M CPU 2.80GHz with 8GB RAM running on an Ubuntu 12.04 virtual machine with a host of Windows 8 64bit operating system. For the Netflix prize dataset, the laptop s computing resources were insufficient, so we implemented and ran a parallel version of the algorithm Ge Com (Algorithm 3, Pa Ge Com) on the Ohio Supercomputer Center 5. |
| Software Dependencies | No | All the algorithms were implemented in C/C++. The implementation was conducted on a laptop of an Intel Core i7-2640M CPU 2.80GHz with 8GB RAM running on an Ubuntu 12.04 virtual machine with a host of Windows 8 64bit operating system. |
| Experiment Setup | Yes | For the Movie Lens 100K dataset, we used the prepared 80%/20% splits of the dataset, i.e., u1.base and u1.test through u5.base and u5.test, to do the 5-fold cross-validation experiments; and for other datasets, we used the 10-fold cross-validation method to evaluate the prediction accuracy. For the trust derivation, we computed the propagation at most two times. |