reproducibilityindex.ai

Dynamics Generalisation in Reinforcement Learning via Adaptive Context-Aware Policies

Authors: Michael Beukman, Devon Jarvis, Richard Klein, Steven James, Benjamin Rosman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that the Decision Adapter is a useful generalisation of a previously proposed architecture and empirically demonstrate that it results in superior generalisation performance compared to previous approaches in several environments.
Researcher Affiliation	Academia	1University of the Witwatersrand 2University of Oxford 3University College London
Pseudocode	Yes	Algorithm 1 Decision Adapter Changes to the standard forward pass in blue. 1: procedure ADAPTERFORWARD(s S, c C) 2: x1 = s 3: for i {1, 2, . . . , n} do 4: if Ai = null then 5: θi A = Hi(c) // Generate Weights 6: x i = Ai(xi\|θi A) // Forward Pass 7: xi = xi + x i // Skip connection 8: end if 9: xi+1 = Li(xi) 10: end for 11: return xn+1 12: end procedure
Open Source Code	Yes	We publicly release code at https://github.com/Michael-Beukman/DecisionAdapter.
Open Datasets	No	The paper describes the environments (ODE, Cart Pole, Mujoco Ant) and their parameters, but does not provide specific links, DOIs, repository names, or formal citations for publicly available datasets used for training or evaluation, nor does it explicitly state the custom ODE environment's data is made public.
Dataset Splits	No	The paper discusses training and evaluation context sets, including interpolation and extrapolation ranges for evaluation, but does not specify a distinct validation dataset split for hyperparameter tuning or early stopping.
Hardware Specification	Yes	For compute, we used an internal cluster consisting of nodes with NVIDIA RTX 3090 GPUs.
Software Dependencies	No	We use the high-performing and standard implementation of SAC from the Clean RL library [98] with neural networks being written in Py Torch [99].
Experiment Setup	Yes	We use the default hyperparameters, which are listed in Table 2. Table 2: The default hyperparameters we use in the Clean RL implementation. Buffer Size 1 000 000 γ 0.99 τ 0.005 Batch Size 256 Exploration Noise 0.1 First Learning Timestep 5000 Policy Learning Rate 0.0003 Critic Learning Rate 0.001 Policy Update Frequency 2 Target Network Update Frequency 1 Noise Clip 0.5 Automatically Tune Entropy Yes