Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies
Authors: Michael Beukman, Devon Jarvis, Richard Klein, Steven James, Benjamin Rosman
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance compared to previous approaches in several environments. |
| Researcher Affiliation | Academia | 1University of the Witwatersrand 2University of Oxford 3University College London |
| Pseudocode | Yes | Algorithm 1 Decision Adapter Changes to the standard forward pass in blue. 1: procedure ADAPTERFORWARD(s S, c C) 2: x1 = s 3: for i {1, 2, . . . , n} do 4: if Ai = null then 5: θi A = Hi(c) // Generate Weights 6: x i = Ai(xi|θi A) // Forward Pass 7: xi = xi + x i // Skip connection 8: end if 9: xi+1 = Li(xi) 10: end for 11: return xn+1 12: end procedure |
| Open Source Code | Yes | We publicly release code at https://github.com/Michael-Beukman/DecisionAdapter. |
| Open Datasets | No | The paper describes the environments (ODE, Cart Pole, Mujoco Ant) and their parameters, but does not provide specific links, DOIs, repository names, or formal citations for publicly available datasets used for training or evaluation, nor does it explicitly state the custom ODE environment's data is made public. |
| Dataset Splits | No | The paper discusses training and evaluation context sets, including interpolation and extrapolation ranges for evaluation, but does not specify a distinct validation dataset split for hyperparameter tuning or early stopping. |
| Hardware Specification | Yes | For compute, we used an internal cluster consisting of nodes with NVIDIA RTX 3090 GPUs. |
| Software Dependencies | No | We use the high-performing and standard implementation of SAC from the Clean RL library [98] with neural networks being written in Py Torch [99]. |
| Experiment Setup | Yes | We use the default hyperparameters, which are listed in Table 2. Table 2: The default hyperparameters we use in the Clean RL implementation. Buffer Size 1 000 000 γ 0.99 τ 0.005 Batch Size 256 Exploration Noise 0.1 First Learning Timestep 5000 Policy Learning Rate 0.0003 Critic Learning Rate 0.001 Policy Update Frequency 2 Target Network Update Frequency 1 Noise Clip 0.5 Automatically Tune Entropy Yes |