Escaping the Gravitational Pull of Softmax
Authors: Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In addition to proving bounds on convergence rates to firmly establish these results, we also provide experimental evidence for the superiority of the escort transformation. ... We conduct several experiments to verify the effectiveness of the proposed escort transform in policy gradient and cross entropy minimization. |
| Researcher Affiliation | Collaboration | 1University of Alberta 2Deep Mind 3Amazon 4Google Research, Brain Team |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Next, we do experiments on MNIST dataset. |
| Dataset Splits | Yes | The dataset is split into training set with 55000, validation set with 5000, and testing set with 10000 data points. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions implementing models using 'one hidden layer Re LU neural network' and 'mini-batch stochastic gradient descent', but it does not specify any software names with version numbers (e.g., PyTorch, TensorFlow, Python version) that would be needed for reproduction. |
| Experiment Setup | Yes | Full gradient SPG updates with stepsize η = 0.4. ... with learning rate ηt = θt 2 p 4 (3+c2 1). ... We use one hidden layer neural network with 512 hidden nodes and Re LU activation to parameterize θ. ... We use mini-batch stochastic gradient descent in this experiment. |