Learning to Learn Gradient Aggregation by Gradient Descent

Authors: Jinlong Ji, Xuhui Chen, Qianlong Wang, Lixing Yu, Pan Li

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the performance of the proposed RNN aggregator and RNN aggregator with additional loss information (ARNN aggregator).
Researcher Affiliation Academia Jinlong Ji1 , Xuhui Chen1,2 , Qianlong Wang1 , Lixing Yu1 and Pan Li1 1Case Western Reserve University 2Kent State University {jxj405, qxw204, lxy257, pxl288}@case.edu, xchen2@kent.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code (e.g., a specific repository link or an explicit code release statement).
Open Datasets Yes We conduct experiments on two image classification tasks: handwritten digits classification on MNIST dataset and object recognition on CIFAR-10 dataset.
Dataset Splits No We pick the best aggregator according to its performance on validation data and report its performance. The paper mentions "validation data" but does not specify the dataset split sizes or methodology for training/validation/test splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No In all experiments, both the RNN aggregators and ARNN aggregators use two-layer LSTMs (or GRUs) with 20 hidden units in each layer and they are trained with BPFT. The optimization process is performed by using ADAM. The paper mentions software components but does not provide specific version numbers for reproducibility.
Experiment Setup Yes In all experiments, both the RNN aggregators and ARNN aggregators use two-layer LSTMs (or GRUs) with 20 hidden units in each layer and they are trained with BPFT. The optimization process is performed by using ADAM. In addition, we use early stopping to avoid overfitting while training the aggregators. After some fixed number of learning steps, we freeze the aggregator s parameters and evaluate its performance. We pick the best aggregator according to its performance on validation data and report its performance. In the MNIST task, the base network in each worker is multiple layer perceptrons (MLP) with one hidden layer of 20 units using a sigmoid activation function. Meanwhile, in the CIFAR-10 task, each worker holds a model, including two convolutional layers with max pooling followed by a fully-connected layer.