Evolutionary Dynamics of Q-Learning over the Sequence Form

Authors: Fabio Panozzo, Nicola Gatti, Marcello Restelli

AAAI 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Originally from the previous works on evolutionary game theory models form multi agent learning, we produce an experimental evaluation to show the accuracy of the model.
Researcher Affiliation Academia Fabio Panozzo and Nicola Gatti and Marcello Restelli Department of Electronics, Information and Bioengineering, Politecnico di Milano Piazza Leonardo da Vinci, 32 I-20133, Milan, Italy {panozzo.fabio,nicola.gatti,marcello.restelli}@polimi.it
Pseudocode Yes Algorithm 2 SFpure strategy(xi) and Algorithm 3 SFQ learning are present in the paper.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper discusses extensive form games and uses an example game (Figure 1), but does not provide concrete access information (link, DOI, repository, or formal citation) for a publicly available or open dataset used for training.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes In our experimental setting τ is a linear increasing function of time starting from 0.0001 and ending to 0.5, α is exponential decreasing starting from 1 and ending to 0.2. The algorithm stops when the difference of expected utility between iterations n and n 1 of both agents is smaller than 0.001 for 1000 consecutive iterations.