Optimal Sample Complexity of M-wise Data for Top-K Ranking
Authors: Minje Jang, Sunghyun Kim, Changho Suh, Sewoong Oh
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We run numerical experiments using synthetic data and confirm that the optimal sample size decreases at the rate of 1/M. Moreover, running our algorithm on real-world data, we find that its applicability extends to settings that may not fit the PL model. We conduct numerical experiments to support our result. |
| Researcher Affiliation | Academia | Minje Jang School of Electrical Engineering KAIST jmj427@kaist.ac.kr Sunghyun Kim Electronics and Telecommunications Research Institute Daejeon, Korea koishkim@etri.re.kr Changho Suh School of Electrical Engineering KAIST chsuh@kaist.ac.kr Sewoong Oh Industrial and Enterprise Systems Engineering Department UIUC swoh@illinois.edu |
| Pseudocode | Yes | Algorithm 1 Rank Centrality [37] Input the collection of statistics s = s I : I E(M) . Convert the M-wise sample for each hyper-edge I into M pairwise samples: 1. Choose a circular permutation of the items in I uniformly at random, 2. Break it into the M pairs of adjacent items, and denote the set of pairs by φ(I), 3. Use the (pairwise) data of the pairs in φ(I). |
| Open Source Code | No | The paper does not include any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | No | The paper uses synthetic data and real-world data collected from League of Legends, but does not provide concrete access information (e.g., specific link, DOI, repository name, formal citation with authors/year) for a publicly available dataset. |
| Dataset Splits | No | The paper discusses experimental parameters like 'L: number of repeated comparisons' and 'p: probability of edge existence', but does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We set constant c1 = 2, and set pdense = 0.25 and psparse = 0.025, to make each be in its proper range. We use n = 500, K = 10, and K = 0.1. We set the other parameters as n = 100, L = 20, K = 5 and K = 0.3. We use n = 100, M = 4, p = 0.0025 (M 1) q log n/ n 1 M 1 , K = 5 and K = 0.3. |