BRExIt: On Opponent Modelling in Expert Iteration
Authors: Daniel Hernandez, Hendrik Baier, Michael Kaisers
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In an empirical ablation on BREx It s algorithmic variants against a set of fixed test agents, we provide statistical evidence that BREx It learns better performing policies than Ex It. |
| Researcher Affiliation | Collaboration | 1Sony AI 2University of York, UK 3Eindhoven University of Technology, The Netherlands 4Centrum Wiskunde & Informatica, The Netherlands |
| Pseudocode | Yes | Algorithms 1, 2 and 3 depict BREx It s data collection, model update logic and overarching training loop respectively for a sequential environment. |
| Open Source Code | Yes | Code available at: https://github.com/Danielhp95/ on-opponent-modelling-in-expert-iteration-code. |
| Open Datasets | No | We conducted our experiments in the fully observable, sequential two-player game of Connect4, which is computational amenable and possesses a high degree of skill transitivity [Czarnecki et al., 2020]. The paper uses an environment (Connect4) and generates agents, but does not provide details or links to a specific public dataset used for training or testing, other than the environment itself. |
| Dataset Splits | No | The paper describes continuous training of agents within the Connect4 environment and evaluating winrates, but does not specify traditional training/validation/test dataset splits with percentages or sample counts, as it operates on an interactive environment rather than a static dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions various algorithms and models (e.g., PPO, IMPALA, DQN, DPIQN) but does not specify the version numbers of any software dependencies, libraries, or frameworks used for implementation. |
| Experiment Setup | No | The paper mentions training duration ('48 wall-clock hours each') and evaluation frequency ('every 800 episodes'), and describes how test agents were generated, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, optimizer details) or detailed model configuration settings for the experimental setup. |