Catch Me if I Can: Detecting Strategic Behaviour in Peer Assessment
Authors: Ivan Stelmakh, Nihar B. Shah, Aarti Singh4794-4802
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For this, we design and conduct an experiment that elicits strategic behaviour from subjects and release a dataset of patterns of strategic behaviour that may be of independent interest. We use this data to run a series of real and semi-synthetic evaluations that reveal a strong detection power of our test. |
| Researcher Affiliation | Academia | Ivan Stelmakh, Nihar B. Shah and Aarti Singh School of Computer Science Carnegie Mellon University {stiv,nihars,aarti}@cs.cmu.edu |
| Pseudocode | Yes | Test 1 Test for strategic behaviour Input: Reviewers rankings {πi, i R} Assignment M of works to reviewers Conflict and authorship matrices (C, A) Significance level α, aggregation rule Λ Optional Argument: Impartial rankings {π i , i R} 1. Compute the test statistic τ as ... 2. Compute a multiset P(M) as follows. ... 3. For each matrix A P(M) define ϕ(A ) to be the value of the test statistic (1) if we substitute A with A , that is, ϕ(A ) is the value of the test statistic if the authorship relationship was represented by A instead of A. ... 4. Reject the null if τ is strictly smaller than the ( α|Φ| + 1)th order statistic of Φ. |
| Open Source Code | No | The paper states that a dataset is released in supplementary materials via the first author's website, but it does not explicitly state that the source code for the methodology or any part of the paper's contribution is open-source or provided. |
| Open Datasets | Yes | This experiment yields a novel dataset of patterns of strategic behaviour that can be useful for other researchers (the dataset is attached in supplementary materials)1. 1Supplementary materials and appendices are on the first author s website. |
| Dataset Splits | No | The paper describes experiments to evaluate the proposed statistical test, but it does not specify traditional training/validation/test dataset splits typically used for machine learning model training or reproduction. The data collected from the experiment is used to evaluate the test itself, not to train a model. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. |
| Experiment Setup | Yes | We design a game for m = 20 players and n = 20 hypothetical submissions. ... Each submission is associated to a unique value v {1, 2, . . . , 20}... We then communicate values of some µ = 4 other contestants to each player subject to the constraint that a value of each player becomes known to λ = 4 counterparts. ... For the experiment, we create 5 rounds of the game... Each of the N = 55 subjects then participates in all 5 rounds... For each of the 1,000 iterations... setting significance level at α = 0.05 and sampling k = 100 authorship matrices in Step 3 of the test. |