Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning
Authors: Aditya Ramesh, Kenny John Young, Louis Kirsch, Jürgen Schmidhuber
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Chunked-TD can assign credit much faster than conventional TD(λ) in the considered Markov decision processes with tabular value functions. |
| Researcher Affiliation | Academia | 1The Swiss AI Lab IDSIA, USI & SUPSI 2University of Alberta and the Alberta Machine Intelligence Institute 3AI Initiative, King Abdullah University of Science and Technology. |
| Pseudocode | Yes | Algorithm 1 Chunked-TD state value evaluation; Algorithm 2 Chunked SARSA; Algorithm 3 Chunked Expected-SARSA; Algorithm 4 Chunked Expected-SARSA with decomposed rewards |
| Open Source Code | Yes | Code is available at https://github.com/Aditya-Ramesh-10/chunktd. |
| Open Datasets | No | The paper describes custom-designed environments (Chain-and-Split, Accumulated-Charge, Key-to-Door) for its experiments, where data is generated through interaction with these environments rather than using pre-existing publicly available datasets. |
| Dataset Splits | No | The paper describes experimental setups within custom environments but does not provide explicit training, validation, or test dataset splits in terms of percentages or counts, as data is generated dynamically during interaction. |
| Hardware Specification | No | Computational resources for this work were provided by the Swiss National Supercomputing Centre (CSCS project s1205). |
| Software Dependencies | No | The paper mentions using the Adam optimizer and states that other hyperparameters match PyTorch defaults, but it does not specify version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | We consider multiple values of the learning rate α for SARSA(λ) with λ {0, 1.0} and for Chunked SARSA. We consider α 0.1 {2 2, 2 3, 2 4, 2 5, 2 6, 2 7, 2 8, 2 9, 2 102 11, 2 12, 2 13}. Table 3. NN model hyperparameters for accumulated charge experiment. Hyperparameter Value Model trained every kmodel steps 4 Batch size 128 Model learning rate (η) 0.0001 Weight decay for optimizer 1e-6 Replay buffer size 100000. Table 4. NN model hyperparameters for key-to-door. Hyperparameter Value Model trained every kmodel steps 1 Batch size 64 Model learning rate (η) 0.0002 Replay buffer size 10000. |