Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Themis: A Fair Evaluation Platform for Computer Vision Competitions

Authors: Zinuo Cai, Jianyong Yuan, Yang Hua, Tao Song, Hao Wang, Zhengui Xue, Ningxin Hu, Jonathan Ding, Ruhui Ma, Mohammad Reza Haghighat, Haibing Guan

IJCAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the validity of THEMIS with a wide spectrum of realworld models and datasets. Our experimental results show that THEMIS effectively enforces competition fairness by precluding manual labeling of test sets and preserving the performance ranking of participants models.
Researcher Affiliation Collaboration 1Shanghai Jiao Tong University 2Queen s University Belfast 3Louisiana State University 4Intel
Pseudocode Yes Algorithm 1 Training the noise generator
Open Source Code Yes THEMIS is open-sourced at https://github.com/AISIGSJTU/Themis.
Open Datasets Yes We select three datasets to evaluate our framework: the UTKFace dataset, the CIFAR-10 and CIFAR-100 datasets.
Dataset Splits Yes In all experiments, we split them into three parts training sets, validation sets, and test sets with the ration 4:1:1.
Hardware Specification Yes We implement the code in Py Torch and run the experiment on an NVIDIA virtual machine with 4 Tesla K80 GPU cores.
Software Dependencies No The paper mentions β€œPy Torch” but does not specify a version number or other software dependencies with version information.
Experiment Setup No The paper describes the general simulation of the training process but does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, number of epochs) or optimizer settings.