Learning to Learn Gradient Aggregation by Gradient Descent
Authors: Jinlong Ji, Xuhui Chen, Qianlong Wang, Lixing Yu, Pan Li
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the performance of the proposed RNN aggregator and RNN aggregator with additional loss information (ARNN aggregator). |
| Researcher Affiliation | Academia | Jinlong Ji1 , Xuhui Chen1,2 , Qianlong Wang1 , Lixing Yu1 and Pan Li1 1Case Western Reserve University 2Kent State University {jxj405, qxw204, lxy257, pxl288}@case.edu, xchen2@kent.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code (e.g., a specific repository link or an explicit code release statement). |
| Open Datasets | Yes | We conduct experiments on two image classification tasks: handwritten digits classification on MNIST dataset and object recognition on CIFAR-10 dataset. |
| Dataset Splits | No | We pick the best aggregator according to its performance on validation data and report its performance. The paper mentions "validation data" but does not specify the dataset split sizes or methodology for training/validation/test splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | In all experiments, both the RNN aggregators and ARNN aggregators use two-layer LSTMs (or GRUs) with 20 hidden units in each layer and they are trained with BPFT. The optimization process is performed by using ADAM. The paper mentions software components but does not provide specific version numbers for reproducibility. |
| Experiment Setup | Yes | In all experiments, both the RNN aggregators and ARNN aggregators use two-layer LSTMs (or GRUs) with 20 hidden units in each layer and they are trained with BPFT. The optimization process is performed by using ADAM. In addition, we use early stopping to avoid overfitting while training the aggregators. After some fixed number of learning steps, we freeze the aggregator s parameters and evaluate its performance. We pick the best aggregator according to its performance on validation data and report its performance. In the MNIST task, the base network in each worker is multiple layer perceptrons (MLP) with one hidden layer of 20 units using a sigmoid activation function. Meanwhile, in the CIFAR-10 task, each worker holds a model, including two convolutional layers with max pooling followed by a fully-connected layer. |