Conjugated Discrete Distributions for Distributional Reinforcement Learning

Authors: Björn Lindenberg, Jonas Nordqvist, Karl-Olof Lindahl7516-7524

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate its performance in a stochastic setting we train agents on a suite of 55 Atari 2600 games using sticky-actions and obtain stateof-the-art performance compared to other well-known algorithms in the Dopamine framework.
Researcher Affiliation Academia Department of Mathematics, Linnæus University, V axj o, Sweden, bjorn.lindenberg@lnu.se
Pseudocode Yes Algorithm 1: Squared Cram er distance ℓ2 2(µ, ν) for discrete distributions
Open Source Code Yes *Appendix: https://github.com/bjliaa/c2d
Open Datasets Yes The algorithm was evaluated on a suite of 55 Atari 2600 games where the use of sticky actions (Machado et al. 2018) induced nondeterministic MDPs. For an apples-to-apples comparison with other algorithms we used the Dopamine framework protocol (Castro et al. 2018), where all involved agents were trained using a joint set of hyperparameters, including the sticky action probability.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning in the context of typical train/validation/test sets. It describes training agents in an environment and evaluating their performance over time.
Hardware Specification No The paper mentions 'LNU-DISA High Performance Computing Platform' but does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions 'ADAM as the network optimizer' but does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup Yes Table 1 lists common hyperparameters: 'Min. training ε 0.01', 'ε-decay schedule (1.0 min. ε) 1M frames', 'Min. history to start learning 80k frames', 'Target network update frequency 32k frames', 'Sticky actions 0.25'. The paper also specifies for C2D: 'N = 32', 'learning rate 0.5 10 4 and epsilon value 3.125 10 4', 'β = 1.99', 'α = 50 and c = 5'.