Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online

Authors: Yangchen Pan, Kirby Banman, Martha White

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We first show that FTA is robust under covariate shift in a synthetic online supervised learning problem, where we can vary the level of correlation and drift. Then we move to the deep reinforcement learning setting and investigate both value-based and policy gradient algorithms that use neural networks with FTAs, in classic discrete control and Mujoco continuous control environments. We show that algorithms equipped with FTAs are able to learn a stable policy faster without needing target networks on most domains.
Researcher Affiliation Academia Yangchen Pan University of Alberta pan6@ualberta.ca Kirby Banman University of Alberta kdbanman@ualberta.ca Martha White University of Alberta whitem@ualberta.ca
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/yannickycpan/reproduceRL.git
Open Datasets Yes Mnist (Le Cun & Cortes, 2010) and Mnistfashion (Xiao et al., 2017). All discrete action domains are from Open AI Gym (Brockman et al., 2016) with version 0.14.0. On Mujoco domains, we use default settings for maximum episodic length.
Dataset Splits Yes We use 10-fold cross validation to choose the best learning rate and the above error rate is reported on testing set by using the optimal learning rate at the end of learning.
Hardware Specification No The paper does not explicitly specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies Yes Deep learning implementation is based on tensorflow with version 1.13.0 (Abadi & et. al, 2015). We use Adam optimizer (Kingma & Ba, 2015)... All discrete action domains are from Open AI Gym (Brockman et al., 2016) with version 0.14.0.
Experiment Setup Yes We use Adam optimizer (Kingma & Ba, 2015), Xavier initializer (Glorot & Bengio, 2010), mini-batch size b = 64, buffer size 100k, and discount rate γ = 0.99 across all experiments. Note that we keep the same FTA setting across all experiments: we set [l, u] = [ 20, 20], δ = η = 2.0, and hence c = { 20, 18, 16, ..., 18}, k = 40/2 = 20. For DQN, the learning rate is 0.0001 and the target network is updated every 1k steps. For DDPG, the target network moving rate is 0.001 and the actor network learning rate is 0.0001, critic network learning rate is 0.001.