Functional Regularization for Reinforcement Learning via Learned Fourier Features

Authors: Alexander Li, Deepak Pathak

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on standard state-based and image-based RL benchmarks show clear benefits of our architecture over the baselines.
Researcher Affiliation Academia Alexander C. Li Carnegie Mellon University alexanderli@cmu.edu Deepak Pathak Carnegie Mellon University dpathak@cs.cmu.edu
Pseudocode Yes Algorithm 1 LFF Py Torch-like pseudocode. class LFF(): def __init__(self, input_size, output_size, n_hidden=1, hidden_dim=256, sigma=1.0, f_dim=256): # create B b_shape = (input_size, f_dim // 2) self.B = Parameter(normal(zeros(*b_shape), sigma * ones(*b_shape))) # create rest of network self.mlp = MLP(in_dims=f_dim + input_size, out_dims=output_size, n_hidden=n_hidden, hidden_dim=hidden_dim) def forward(self, x): proj = (2 * np.pi) * matmul(x, self.B) ff = cat([sin(proj), cos(proj), x], dim=-1) return self.mlp.forward(ff)
Open Source Code Yes Code available at https://github.com/alexlioralexli/learned-fourier-features
Open Datasets Yes We use soft actor-critic (SAC), an entropy-regularized offpolicy RL algorithm [14], to learn 8 environments from the Deep Mind Control Suite [43].
Dataset Splits No The paper evaluates performance on reinforcement learning environments (Deep Mind Control Suite) but does not specify fixed training, validation, and test dataset splits in the traditional supervised learning sense. Data is generated through interaction with the environment, and performance is typically evaluated over episodes during training.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory, or cloud instances) used to run the experiments.
Software Dependencies No The paper mentions 'PyTorch-like pseudocode' in Algorithm 1, implying the use of PyTorch, but it does not specify any version numbers for PyTorch or other software libraries or dependencies used in the experiments.
Experiment Setup Yes Our LFF architecture uses our learnable Fourier feature input layer, followed by 2 hidden layers of 1024 units. We use Fourier dimension dfourier of size 1024. We initialize the entries of our trainable Fourier basis with Bij N(0, σ2), with σ = 0.01 for all environments except Cheetah, Walker, and Hopper, where we use σ = 0.001. [...] The 1x1 conv weights are initialized from N(0, σ2) with σ = 0.1 for Hopper and Cheetah and σ = 0.01 for Finger and Quadruped.