Efficient Adaptive Online Learning via Frequent Directions

Authors: Yuanyu Wan, Nan Wei, Lijun Zhang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify the efficiency and effectiveness of our ADA-FD, we conduct several numerical experiments on online convex optimization and training CNN. The results turn out that our ADA-FD performs comparably with ADAFULL but is much more efficient.
Researcher Affiliation Academia Yuanyu Wan, Nan Wei and Lijun Zhang National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China {wanyy, zhanglj}@lamda.nju.edu.cn, nwei@smail.nju.edu.cn
Pseudocode Yes Algorithm 1 Adaptive Dual Averaging via Frequent Directions Algorithm 2 Adaptive Mirror Descent via Frequent Directions
Open Source Code No No explicit statement about open-sourcing the code or a link to a repository is provided.
Open Datasets Yes two real world datasets from LIBSVM repository [Chang and Lin, 2011]: Gisette and Epsilon MNIST [Le Cun et al., 1998], CIFAR10 [Krizhevsky, 2009] and SVHN datasets [Netzer et al., 2011]
Dataset Splits No The paper explicitly states the datasets are divided into "training part and testing part" with specific numbers (e.g., "Gisette 6,000/1,000" and "Epsilon 400,000/100,000"). However, it does not explicitly mention a separate "validation" split with percentages or counts, although parameter tuning is implied.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for running experiments.
Software Dependencies No The paper mentions "Keras examples directory" but does not provide specific version numbers for Keras or any other software dependencies.
Experiment Setup Yes Parameters η and δ are searched in {1e 4, 1e 3, , 100} (for online regression/classification) and {1e 8, 1e 7, , 1} (for CNN). we set τ = 10 for methods using matrix approximation. we set the sketching size τ = 10 for Gisette and τ = 40 for Epsilon. batch size 128 we set the sketching size τ = 20 for all datasets.