reproducibilityindex.ai

BRExIt: On Opponent Modelling in Expert Iteration

Authors: Daniel Hernandez, Hendrik Baier, Michael Kaisers

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In an empirical ablation on BREx It s algorithmic variants against a set of fixed test agents, we provide statistical evidence that BREx It learns better performing policies than Ex It.
Researcher Affiliation	Collaboration	1Sony AI 2University of York, UK 3Eindhoven University of Technology, The Netherlands 4Centrum Wiskunde & Informatica, The Netherlands
Pseudocode	Yes	Algorithms 1, 2 and 3 depict BREx It s data collection, model update logic and overarching training loop respectively for a sequential environment.
Open Source Code	Yes	Code available at: https://github.com/Danielhp95/ on-opponent-modelling-in-expert-iteration-code.
Open Datasets	No	We conducted our experiments in the fully observable, sequential two-player game of Connect4, which is computational amenable and possesses a high degree of skill transitivity [Czarnecki et al., 2020]. The paper uses an environment (Connect4) and generates agents, but does not provide details or links to a specific public dataset used for training or testing, other than the environment itself.
Dataset Splits	No	The paper describes continuous training of agents within the Connect4 environment and evaluating winrates, but does not specify traditional training/validation/test dataset splits with percentages or sample counts, as it operates on an interactive environment rather than a static dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions various algorithms and models (e.g., PPO, IMPALA, DQN, DPIQN) but does not specify the version numbers of any software dependencies, libraries, or frameworks used for implementation.
Experiment Setup	No	The paper mentions training duration ('48 wall-clock hours each') and evaluation frequency ('every 800 episodes'), and describes how test agents were generated, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, optimizer details) or detailed model configuration settings for the experimental setup.