reproducibilityindex.ai

Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment

Authors: Ivan Stelmakh, Nihar B. Shah, Aarti Singh4794-4802

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For this, we design and conduct an experiment that elicits strategic behaviour from subjects and release a dataset of patterns of strategic behaviour that may be of independent interest. We use this data to run a series of real and semi-synthetic evaluations that reveal a strong detection power of our test.
Researcher Affiliation	Academia	Ivan Stelmakh, Nihar B. Shah and Aarti Singh School of Computer Science Carnegie Mellon University {stiv,nihars,aarti}@cs.cmu.edu
Pseudocode	Yes	Test 1 Test for strategic behaviour Input: Reviewers rankings {πi, i R} Assignment M of works to reviewers Conﬂict and authorship matrices (C, A) Signiﬁcance level α, aggregation rule Λ Optional Argument: Impartial rankings {π i , i R} 1. Compute the test statistic τ as ... 2. Compute a multiset P(M) as follows. ... 3. For each matrix A P(M) deﬁne ϕ(A ) to be the value of the test statistic (1) if we substitute A with A , that is, ϕ(A ) is the value of the test statistic if the authorship relationship was represented by A instead of A. ... 4. Reject the null if τ is strictly smaller than the ( α\|Φ\| + 1)th order statistic of Φ.
Open Source Code	No	The paper states that a dataset is released in supplementary materials via the first author's website, but it does not explicitly state that the source code for the methodology or any part of the paper's contribution is open-source or provided.
Open Datasets	Yes	This experiment yields a novel dataset of patterns of strategic behaviour that can be useful for other researchers (the dataset is attached in supplementary materials)1. 1Supplementary materials and appendices are on the ﬁrst author s website.
Dataset Splits	No	The paper describes experiments to evaluate the proposed statistical test, but it does not specify traditional training/validation/test dataset splits typically used for machine learning model training or reproduction. The data collected from the experiment is used to evaluate the test itself, not to train a model.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments.
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers.
Experiment Setup	Yes	We design a game for m = 20 players and n = 20 hypothetical submissions. ... Each submission is associated to a unique value v {1, 2, . . . , 20}... We then communicate values of some µ = 4 other contestants to each player subject to the constraint that a value of each player becomes known to λ = 4 counterparts. ... For the experiment, we create 5 rounds of the game... Each of the N = 55 subjects then participates in all 5 rounds... For each of the 1,000 iterations... setting significance level at α = 0.05 and sampling k = 100 authorship matrices in Step 3 of the test.