Stochastic Loss Function
Authors: Qingliang Liu, Jinmei Lai4884-4891
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on a variety of popular datasets strongly demonstrate that SLF is capable of obtaining appropriate gradients at different stages during training, and can significantly improve the performance of various deep models on real world tasks including classification, clustering, regression, neural machine translation, and objection detection. |
| Researcher Affiliation | Academia | Qingliang Liu, Jinmei Lai State Key Lab of ASIC and System, School of Microelectronics, Fudan University, Shanghai, China {qlliu17, jmlai}@fudan.edu.cn |
| Pseudocode | Yes | Algorithm 1 Stochastic Loss Function Require: Dataset D = {(xi, yi)}i=1, loss functions L = {ℓi}n i=1, networks f( ; w) and h( ; v), sampling times K. Ensure: Trained main network f( ; w ). 1: Randomly initialize parameters w in the main network f( ; w) and v in the decision network h( ; v); 2: for number of training iterations do 3: for (x, y) D do 4: p = f(x; w); the estimated output of the main network with the parameter w 5: for each time step k = 1, 2, , K do 6: ˆpk = G(h(p; v)); selecting loss functions with the decision network and Gumbel Softmax 7: end for 8: ˆp = 1 K K k=1 ˆpk; weighting and combing the loss functions according to Eq. (10) 9: H(L, w, v) = ℓi L ˆpi ℓi(f(x; w), y); computing the loss of our SLF according to Eq. (11) 10: Update w and v by minimizing Eq. (11); updating the parameters w and v with the standard back-propagation 11: end for 12: end for |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | To validate its capability, we carry out a series of image classification tasks on three frequently-used datasets, including MNIST, CIFAR-10, and CIFAR-100. For each dataset, several popular deep neural networks are employed to demonstrate the capability. |
| Dataset Splits | No | The paper mentions using a 'validation set' in the introduction for dynamic adjustment of gradients, but it does not specify the actual data splitting percentages or counts for training, validation, and testing sets needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments, only general statements about deep networks. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | The hyper-parameters in our experiments are set as follows. In each experiment, our SLF model always inherits the same setting in the compared baslines, including network architectures (e.g., activation functions, initilizations, bach sizes, etc.), learning rates, and the optimizors, except of the loss functions. |