Semisupervised Clustering, AND-Queries and Locally Encodable Source Coding

Authors: Arya Mazumdar, Soumyabrata Pal

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Although the focus of this paper is primarily theoretical, we also perform a real crowdsourcing experiment to validate our algorithm. ... Though our main contribution is theoretical we have verified our work by using our algorithm on a real dataset created by local crowdsourcing. ... All these results have been compiled in Figure 5 and we can observe that the distortion is decreasing with the number of queries and the gap between the theoretical result and the experimental results is small for T = 2. These results validate our theoretical results and our algorithm to a large extent.
Researcher Affiliation Academia Arya Mazumdar College of Information & Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 arya@cs.umass.edu Soumyabrata Pal College of Information & Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 soumyabratap@umass.edu
Pseudocode Yes Algorithm 1 Noisy query approximate recovery with nd/2
Open Source Code No The paper does not provide a direct link to open-source code for the described methodology, nor does it explicitly state that the code is publicly available in supplementary materials or elsewhere.
Open Datasets Yes We first picked a list of 100 action movies and 100 romantic movies from IMDB (http://www.imdb.com/list/ls076503982/ and http://www.imdb.com/list/ls058479560/).
Dataset Splits No The paper describes using a dataset for experiments but does not specify a division into training, validation, and test sets with explicit percentages or sample counts. The experiment involves collecting query answers from volunteers and then reconstructing labels, not training a machine learning model with distinct data splits.
Hardware Specification No The paper describes a crowdsourcing experiment but does not specify any hardware details (e.g., CPU, GPU models, memory) used for running simulations, data processing, or algorithm execution. It only mentions using '10 volunteers'.
Software Dependencies No The paper mentions using "Survey Monkey platform" for creating surveys, but it does not specify a version number for this or any other software used in the experiments. It lacks details on programming languages, libraries, or solvers with version numbers.
Experiment Setup Yes To create the graph we put all the movies on a circle and took a random permutation on them in a circle. Then for each node we connected d/2 edges on either side to its closest neighbors in the permuted circular list. ... Using d = 10 , we have nd/2 = 1000 queries with each query being the following question: Are both the movies action movies?. Now we divided these 1000 queries into 10 surveys (using Survey Monkey platform) with each survey carrying 100 queries for the user to answer. We used 10 volunteers to fill up the survey. We instructed them not to check any resources and answer the questions spontaneously and also gave them a time limit of a maximum of 10 minutes. ... For each movie we evaluate the d query answer it is part of, and use different thresholds T for prediction.