Escaping the Gravitational Pull of Softmax

Authors: Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In addition to proving bounds on convergence rates to firmly establish these results, we also provide experimental evidence for the superiority of the escort transformation. ... We conduct several experiments to verify the effectiveness of the proposed escort transform in policy gradient and cross entropy minimization.
Researcher Affiliation Collaboration 1University of Alberta 2Deep Mind 3Amazon 4Google Research, Brain Team
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Next, we do experiments on MNIST dataset.
Dataset Splits Yes The dataset is split into training set with 55000, validation set with 5000, and testing set with 10000 data points.
Hardware Specification No The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions implementing models using 'one hidden layer Re LU neural network' and 'mini-batch stochastic gradient descent', but it does not specify any software names with version numbers (e.g., PyTorch, TensorFlow, Python version) that would be needed for reproduction.
Experiment Setup Yes Full gradient SPG updates with stepsize η = 0.4. ... with learning rate ηt = θt 2 p 4 (3+c2 1). ... We use one hidden layer neural network with 512 hidden nodes and Re LU activation to parameterize θ. ... We use mini-batch stochastic gradient descent in this experiment.