Learning the Target Network in Function Space
Authors: Kavosh Asadi, Yao Liu, Shoham Sabach, Ming Yin, Rasool Fakoor
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also present empirical results demonstrating that LR-based targetnetwork updates significantly improve deep RL on the Atari benchmark. |
| Researcher Affiliation | Collaboration | 1Amazon 2Technion 3Princeton University. |
| Pseudocode | Yes | Algorithm 1 Lookahead-Replicate (LR) |
| Open Source Code | No | The paper states, 'We used the implementation of Rainbow from the Dopamine (Castro et al., 2018)', indicating the use of a third-party framework, but does not provide a statement or link for their own source code. |
| Open Datasets | Yes | To evaluate LR in a large-scale setting, we now test it on the Atari benchmark (Bellemare et al., 2013). |
| Dataset Splits | No | The paper mentions tuning hyperparameters and using random seeds, which implies a validation process, but does not provide specific details on validation dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not specify any hardware used for running the experiments, such as GPU or CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions using 'the implementation of Rainbow from the Dopamine' and 'the Adam optimizer' but does not provide specific version numbers for any software components. |
| Experiment Setup | Yes | We used the default hyperparameters from Dopamine. We reasonably tuned the hyper-parameters associated with each update...We used the same learning rate as we did for optimizing the online network in Rainbow namely 6.25 10 5...We also chose the Adam optimizer to update the target network...This allowed us to only focus on tuning the KR parameter. |