Overcoming The Spectral Bias of Neural Value Approximation

Authors: Ge Yang, Anurag Ajay, Pulkit Agrawal

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With just a single line-change, our approach, the Fourier feature networks (FFN) produce state-of-the-art performance on challenging continuous control domains with only a fraction of the compute. ... We scale the use of FFN to high-dimensional continuous control tasks from the Deep Mind control suite (Tassa et al., 2020) using soft actor-critic (SAC, Haarnoja et al. 2018) as the base algorithm. ... We provide extensive empirical analysis on eight common DMC domains and additional results with DDPG in Appendix A.9.
Researcher Affiliation Academia NSF AI Institute for Artificial Intelligence and Fundamental Interactions (IAIFI) Computer Science and Artificial Intelligence Laboratory (CSAIL) Improbable AI Lab Massachusetts Institute Technology
Pseudocode Yes Algorithm Learned Fourier Features (LFF) class LFF(nn.Linear): def init (self, in, out, b scale): super(). init (in, out) init.normal (self.weight, std=b scale/in) init.uniform (self.bias, 1.0, 1.0) def forward(self, x): x = np.pi * super().forward(x) return torch.sin(x)
Open Source Code Yes Code and analysis available at https://geyang.github.io/ffn.
Open Datasets Yes We scale the use of FFN to high-dimensional continuous control tasks from the Deep Mind control suite (Tassa et al., 2020) using soft actor-critic (SAC, Haarnoja et al. 2018) as the base algorithm. ... We use the implementation from the Open AI gym (Brockman et al., 2016), and discretize the state space into 150 bins.
Dataset Splits No The paper does not explicitly provide specific training, validation, and test dataset splits (e.g., percentages, sample counts) for its experiments. It describes data generation for a toy MDP and states using the Deep Mind control suite and OpenAI Gym, which are environments for reinforcement learning where data is typically collected through interaction rather than static splits.
Hardware Specification No The paper mentions "MIT Super Cloud and Lincoln Laboratory Supercomputing Center for providing high performance computing resources" but does not specify any particular hardware models (e.g., GPU, CPU models, or memory specifications).
Software Dependencies No The paper mentions using a "pytorch codebase from Yarats & Kostrikov (2020)" and builds upon "Dr Qv2 (Yarats et al., 2021)", but it does not specify version numbers for PyTorch or any other software libraries used, which are necessary for reproducible descriptions.
Experiment Setup Yes Optimization details We use 4-layer MLP with Re LU Activation, with 400 latent neurons. We use Adam optimization with a learning rate of 1e-4, and optimize for 400 epochs. We use gradient descent with a batch size of 200. ... If d is the input dimension, both MLP and FFN have [40 d, 1024, 1024] as hidden dimension for each of their layers.