Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fourier Features in Reinforcement Learning with Neural Networks

Authors: David Brellmann, David Filliat, Goran Frehse

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we present experiments on Multilayer Perceptrons (MLP) that indicate that even in Deep RL, Fourier features can lead to significant performance gains in both rewards and sample efficiency. Our experiments cover both shallow/deep, discrete/continuous, and on/off-policy RL settings.
Researcher Affiliation	Academia	David Brellmann EMAIL U2IS, ENSTA Paris, Institut Polytechnique de Paris
Pseudocode	No	The paper describes methods like Neural Fitted Q-Iteration (FQI) but does not present them in a structured pseudocode or algorithm block.
Open Source Code	Yes	The code for reproducing all experiments is available on Git Hub at https://github.com/David Brellmann/Fourier_Features_in_RL_with_NN.
Open Datasets	Yes	We apply Fourier Features (FF-NN) and Fourier Light Features (FLF-NN) on the off-policy Deep-Q Network (DQN) algorithm (Mnih et al., 2015) for the discrete action environments and on the on-policy Proximal Policy Optimization (PPO) algorithm (Schulman et al., 2017) for continuous action environments [...] Figure 7 shows the averaged returns per episode for DQN on four discrete-action environments from Open AI Gym (Brockman et al., 2016). Figure 8 shows the averaged returns per episode of PPO on five continuous-action control tasks from Mujoco (Todorov et al., 2012).
Dataset Splits	No	The paper mentions averaging results over multiple training runs (e.g., '30 training (different seeds)', '10 training runs') and using an 'experience replay buffer' for sampling, but does not specify traditional training/validation/test splits for a fixed dataset as typically found in supervised learning tasks. For RL, the data is generated through interaction with environments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It mentions using 'Nivida Isaac Gym Environments (Makoviychuk et al., 2021)' which implies GPU usage for simulations, but no explicit hardware specification for the experimental setup is given.
Software Dependencies	Yes	For Deep Reinforcement Learning implementations, we adopt the code from Stable Baselines-3 (Raffin et al., 2019) with version 0.10.0 based on Pytorch 1.8.0. We use Adam optimizer (Kingma & Ba, 2014), Xavier initializer (Glorot & Bengio, 2010), and Re LU activation functions across all experiments. We tune hyperparameters with Optuna 2.4.0 (Akiba et al., 2019). For generating Polynomial features we use Scikit-Learn (Buitinck et al., 2013). Open AI Gym (Brockman et al., 2016) with version 0.18.0.
Experiment Setup	Yes	Table 6: Sampling Values for DQN lists hyperparameters such as 'Number of Hidden Layers 1', 'Batch Size {16, 32, 64, 100, 128, 256, 512}', 'Replay Buffer Size {1e4, 5e4, 1e5, 1e6}', 'Learning rate [1e 5, 1]', and 'Target Update Frequency'. The paper also specifies 'We use an MLP architecture with a single hidden layer' and 'For DQN, we run 160, 000 timesteps for Acrobot-v1, Cart Pole-v1, Catcher-v1, Lunar Lander-v1 and Mountain Car-v0'.