Optimal Activation Functions for the Random Features Regression Model

Authors: Jianxin Wang, José Bento

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Nonetheless, here we test some of our more general conclusions on real data. This appendix is referenced in the main text in Section 3.3. In this section, the data, and the fact that we do not work with infinite dimensions, are the only deviations from our theoretical setup. In particular, we work with an RFR model. We use the MNIST data Deng (2012) to train an RFR model that approximates a function 푓, our ground truth object, defined as follows. For a given digit image 푥with class 푐 {0, 1, . . . , 9}, we define 푓(푥) = 5 + 푐/9.
Researcher Affiliation Academia Jianxin Wang Department of Electrical and Computer Engineering Rice University jw162@rice.edu José Bento Department of Computer Science Boston College bentoayr@bc.edu
Pseudocode No The paper does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We include code to generate Figure 1 in the following Github link: https://github.com/Jeffwang87/RFR_AF. This code is also available in the supplementary zip file provided.
Open Datasets Yes We use the MNIST data Deng (2012) to train an RFR model that approximates a function 푓, our ground truth object, defined as follows.
Dataset Splits No The paper mentions using 4000 training samples and 10000 test samples, but it does not specify a separate validation dataset split.
Hardware Specification Yes We ran it using a Mac Book Pro with 2.6 GHz 6-Core Intel Core i7 and 32 GB 2667 MHz DDR4.
Software Dependencies Yes It runs using Wolfram Mathematica V12. ... It runs using Matlab 2020b.
Experiment Setup Yes Training is done with 휆= 10 7. ... For the test set we use 10000 random samples. ... in Figure 4 we plot the test error E has a function of 휓1/휓2 = 푁/푛when we have 푛= 4000 train samples and when the number of features 푁ranges from 1 to 14250. ... In Figure 5 we plot the test error E has a function of 휆when 휓2 = 10, when we have 푛= 휓2푑 train samples, and when the number of features is very large, namely, 푁= 10000.