reproducibilityindex.ai

Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning

Authors: Aditya Ramesh, Kenny John Young, Louis Kirsch, Jürgen Schmidhuber

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that Chunked-TD can assign credit much faster than conventional TD(λ) in the considered Markov decision processes with tabular value functions.
Researcher Affiliation	Academia	1The Swiss AI Lab IDSIA, USI & SUPSI 2University of Alberta and the Alberta Machine Intelligence Institute 3AI Initiative, King Abdullah University of Science and Technology.
Pseudocode	Yes	Algorithm 1 Chunked-TD state value evaluation; Algorithm 2 Chunked SARSA; Algorithm 3 Chunked Expected-SARSA; Algorithm 4 Chunked Expected-SARSA with decomposed rewards
Open Source Code	Yes	Code is available at https://github.com/Aditya-Ramesh-10/chunktd.
Open Datasets	No	The paper describes custom-designed environments (Chain-and-Split, Accumulated-Charge, Key-to-Door) for its experiments, where data is generated through interaction with these environments rather than using pre-existing publicly available datasets.
Dataset Splits	No	The paper describes experimental setups within custom environments but does not provide explicit training, validation, or test dataset splits in terms of percentages or counts, as data is generated dynamically during interaction.
Hardware Specification	No	Computational resources for this work were provided by the Swiss National Supercomputing Centre (CSCS project s1205).
Software Dependencies	No	The paper mentions using the Adam optimizer and states that other hyperparameters match PyTorch defaults, but it does not specify version numbers for PyTorch or other software dependencies.
Experiment Setup	Yes	We consider multiple values of the learning rate α for SARSA(λ) with λ {0, 1.0} and for Chunked SARSA. We consider α 0.1 {2 2, 2 3, 2 4, 2 5, 2 6, 2 7, 2 8, 2 9, 2 102 11, 2 12, 2 13}. Table 3. NN model hyperparameters for accumulated charge experiment. Hyperparameter Value Model trained every kmodel steps 4 Batch size 128 Model learning rate (η) 0.0001 Weight decay for optimizer 1e-6 Replay buffer size 100000. Table 4. NN model hyperparameters for key-to-door. Hyperparameter Value Model trained every kmodel steps 1 Batch size 64 Model learning rate (η) 0.0002 Replay buffer size 10000.