Not-So-Random Features
Authors: Brian Bullins, Cyril Zhang, Yi Zhang
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluations on synthetic and real-world datasets demonstrate scalability and consistent improvements over related random features-based methods. Finally, we exhibit experiments on synthetic and benchmark datasets, demonstrating consistent improvements over related random features-based kernel methods. In this section, we highlight the most important and illustrative parts of our experimental results. |
| Researcher Affiliation | Academia | Brian Bullins Cyril Zhang Yi Zhang Department of Computer Science Princeton University Princeton, NJ 08544, USA {bbullins, cyril.zhang, y.zhang}@cs.princeton.edu |
| Pseudocode | Yes | Algorithm 1 Langevin dynamics for kernel alignment. Algorithm 2 No-regret learning dynamics for SVM margin maximization. |
| Open Source Code | Yes | The code can be found at github.com/yz-ignescent/Not-So-Random-Features. |
| Open Datasets | Yes | Challenging label pairs are chosen from the MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky, 2009) datasets; |
| Dataset Splits | No | For synthetic data: 'ntrain = 2000 and ntest = 50000'. For MNIST/CIFAR-10: 'each task consists of 10000 training and 2000 test examples'. No explicit mention or details of a validation set split. |
| Hardware Specification | No | The paper mentions 'an efficient GPU implementation' but does not provide specific hardware details like GPU or CPU models, memory, or detailed cloud/cluster configurations used for experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | Throughout all experiments presented, we use hinge-loss SVM classifiers with C = 1. With regard to Langevin diffusion (Algorithm 1), we observe that the best samples arise from using high temperatures and Gaussian parallel initialization. For the latter, a rule-of-thumb is to initialize 500 parallel copies of Langevin dynamics... Empirically, running Langevin dynamics for 100 steps suffices to locate a reasonably good peak. The step size of online gradient ascent is set to balance between being conservative and promoting diverse samples; |