Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fourier Features in Reinforcement Learning with Neural Networks
Authors: David Brellmann, David Filliat, Goran Frehse
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we present experiments on Multilayer Perceptrons (MLP) that indicate that even in Deep RL, Fourier features can lead to significant performance gains in both rewards and sample efficiency. Our experiments cover both shallow/deep, discrete/continuous, and on/off-policy RL settings. |
| Researcher Affiliation | Academia | David Brellmann EMAIL U2IS, ENSTA Paris, Institut Polytechnique de Paris |
| Pseudocode | No | The paper describes methods like Neural Fitted Q-Iteration (FQI) but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | The code for reproducing all experiments is available on Git Hub at https://github.com/David Brellmann/Fourier_Features_in_RL_with_NN. |
| Open Datasets | Yes | We apply Fourier Features (FF-NN) and Fourier Light Features (FLF-NN) on the off-policy Deep-Q Network (DQN) algorithm (Mnih et al., 2015) for the discrete action environments and on the on-policy Proximal Policy Optimization (PPO) algorithm (Schulman et al., 2017) for continuous action environments [...] Figure 7 shows the averaged returns per episode for DQN on four discrete-action environments from Open AI Gym (Brockman et al., 2016). Figure 8 shows the averaged returns per episode of PPO on five continuous-action control tasks from Mujoco (Todorov et al., 2012). |
| Dataset Splits | No | The paper mentions averaging results over multiple training runs (e.g., '30 training (different seeds)', '10 training runs') and using an 'experience replay buffer' for sampling, but does not specify traditional training/validation/test splits for a fixed dataset as typically found in supervised learning tasks. For RL, the data is generated through interaction with environments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It mentions using 'Nivida Isaac Gym Environments (Makoviychuk et al., 2021)' which implies GPU usage for simulations, but no explicit hardware specification for the experimental setup is given. |
| Software Dependencies | Yes | For Deep Reinforcement Learning implementations, we adopt the code from Stable Baselines-3 (Raffin et al., 2019) with version 0.10.0 based on Pytorch 1.8.0. We use Adam optimizer (Kingma & Ba, 2014), Xavier initializer (Glorot & Bengio, 2010), and Re LU activation functions across all experiments. We tune hyperparameters with Optuna 2.4.0 (Akiba et al., 2019). For generating Polynomial features we use Scikit-Learn (Buitinck et al., 2013). Open AI Gym (Brockman et al., 2016) with version 0.18.0. |
| Experiment Setup | Yes | Table 6: Sampling Values for DQN lists hyperparameters such as 'Number of Hidden Layers 1', 'Batch Size {16, 32, 64, 100, 128, 256, 512}', 'Replay Buffer Size {1e4, 5e4, 1e5, 1e6}', 'Learning rate [1e 5, 1]', and 'Target Update Frequency'. The paper also specifies 'We use an MLP architecture with a single hidden layer' and 'For DQN, we run 160, 000 timesteps for Acrobot-v1, Cart Pole-v1, Catcher-v1, Lunar Lander-v1 and Mountain Car-v0'. |