Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
BRExIt: On Opponent Modelling in Expert Iteration
Authors: Daniel Hernandez, Hendrik Baier, Michael Kaisers
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In an empirical ablation on BREx It s algorithmic variants against a set of fixed test agents, we provide statistical evidence that BREx It learns better performing policies than Ex It. |
| Researcher Affiliation | Collaboration | 1Sony AI 2University of York, UK 3Eindhoven University of Technology, The Netherlands 4Centrum Wiskunde & Informatica, The Netherlands |
| Pseudocode | Yes | Algorithms 1, 2 and 3 depict BREx It s data collection, model update logic and overarching training loop respectively for a sequential environment. |
| Open Source Code | Yes | Code available at: https://github.com/Danielhp95/ on-opponent-modelling-in-expert-iteration-code. |
| Open Datasets | No | We conducted our experiments in the fully observable, sequential two-player game of Connect4, which is computational amenable and possesses a high degree of skill transitivity [Czarnecki et al., 2020]. The paper uses an environment (Connect4) and generates agents, but does not provide details or links to a specific public dataset used for training or testing, other than the environment itself. |
| Dataset Splits | No | The paper describes continuous training of agents within the Connect4 environment and evaluating winrates, but does not specify traditional training/validation/test dataset splits with percentages or sample counts, as it operates on an interactive environment rather than a static dataset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions various algorithms and models (e.g., PPO, IMPALA, DQN, DPIQN) but does not specify the version numbers of any software dependencies, libraries, or frameworks used for implementation. |
| Experiment Setup | No | The paper mentions training duration ('48 wall-clock hours each') and evaluation frequency ('every 800 episodes'), and describes how test agents were generated, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, optimizer details) or detailed model configuration settings for the experimental setup. |