BRExIt: On Opponent Modelling in Expert Iteration

Authors: Daniel Hernandez, Hendrik Baier, Michael Kaisers

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In an empirical ablation on BREx It s algorithmic variants against a set of fixed test agents, we provide statistical evidence that BREx It learns better performing policies than Ex It.
Researcher Affiliation Collaboration 1Sony AI 2University of York, UK 3Eindhoven University of Technology, The Netherlands 4Centrum Wiskunde & Informatica, The Netherlands
Pseudocode Yes Algorithms 1, 2 and 3 depict BREx It s data collection, model update logic and overarching training loop respectively for a sequential environment.
Open Source Code Yes Code available at: https://github.com/Danielhp95/ on-opponent-modelling-in-expert-iteration-code.
Open Datasets No We conducted our experiments in the fully observable, sequential two-player game of Connect4, which is computational amenable and possesses a high degree of skill transitivity [Czarnecki et al., 2020]. The paper uses an environment (Connect4) and generates agents, but does not provide details or links to a specific public dataset used for training or testing, other than the environment itself.
Dataset Splits No The paper describes continuous training of agents within the Connect4 environment and evaluating winrates, but does not specify traditional training/validation/test dataset splits with percentages or sample counts, as it operates on an interactive environment rather than a static dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions various algorithms and models (e.g., PPO, IMPALA, DQN, DPIQN) but does not specify the version numbers of any software dependencies, libraries, or frameworks used for implementation.
Experiment Setup No The paper mentions training duration ('48 wall-clock hours each') and evaluation frequency ('every 800 episodes'), and describes how test agents were generated, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, optimizer details) or detailed model configuration settings for the experimental setup.