Functional Regularization for Reinforcement Learning via Learned Fourier Features
Authors: Alexander Li, Deepak Pathak
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on standard state-based and image-based RL benchmarks show clear benefits of our architecture over the baselines. |
| Researcher Affiliation | Academia | Alexander C. Li Carnegie Mellon University alexanderli@cmu.edu Deepak Pathak Carnegie Mellon University dpathak@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 LFF Py Torch-like pseudocode. class LFF(): def __init__(self, input_size, output_size, n_hidden=1, hidden_dim=256, sigma=1.0, f_dim=256): # create B b_shape = (input_size, f_dim // 2) self.B = Parameter(normal(zeros(*b_shape), sigma * ones(*b_shape))) # create rest of network self.mlp = MLP(in_dims=f_dim + input_size, out_dims=output_size, n_hidden=n_hidden, hidden_dim=hidden_dim) def forward(self, x): proj = (2 * np.pi) * matmul(x, self.B) ff = cat([sin(proj), cos(proj), x], dim=-1) return self.mlp.forward(ff) |
| Open Source Code | Yes | Code available at https://github.com/alexlioralexli/learned-fourier-features |
| Open Datasets | Yes | We use soft actor-critic (SAC), an entropy-regularized offpolicy RL algorithm [14], to learn 8 environments from the Deep Mind Control Suite [43]. |
| Dataset Splits | No | The paper evaluates performance on reinforcement learning environments (Deep Mind Control Suite) but does not specify fixed training, validation, and test dataset splits in the traditional supervised learning sense. Data is generated through interaction with the environment, and performance is typically evaluated over episodes during training. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory, or cloud instances) used to run the experiments. |
| Software Dependencies | No | The paper mentions 'PyTorch-like pseudocode' in Algorithm 1, implying the use of PyTorch, but it does not specify any version numbers for PyTorch or other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | Our LFF architecture uses our learnable Fourier feature input layer, followed by 2 hidden layers of 1024 units. We use Fourier dimension dfourier of size 1024. We initialize the entries of our trainable Fourier basis with Bij N(0, σ2), with σ = 0.01 for all environments except Cheetah, Walker, and Hopper, where we use σ = 0.001. [...] The 1x1 conv weights are initialized from N(0, σ2) with σ = 0.1 for Hopper and Cheetah and σ = 0.01 for Finger and Quadruped. |