PolyLoss: A Polynomial Expansion Perspective of Classification Loss Functions

Authors: Zhaoqi Leng, Mingxing Tan, Chenxi Liu, Ekin Dogus Cubuk, Jay Shi, Shuyang Cheng, Dragomir Anguelov

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results show that the optimal choice within the Poly Loss is indeed dependent on the task and dataset. Simply by introducing one extra hyperparameter and adding one line of code, our Poly-1 formulation outperforms the cross-entropy loss and focal loss on 2D image classification, instance segmentation, object detection, and 3D object detection tasks, sometimes by a large margin.
Researcher Affiliation Industry Zhaoqi Leng1, Mingxing Tan1, Chenxi Liu1, Ekin Dogus Cubuk2, Xiaojie Shi2, Shuyang Cheng1,Dragomir Anguelov1 1Waymo LLC 2Google LLC
Pseudocode Yes Example code for LCE Poly-1 with softmax activation is shown below. def poly1_cross_entropy(logits, labels, epsilon): # epsilon >=-1. # pt, CE, and Poly1 have shape [batch]. pt = tf.reduce_sum(labels * tf.nn.softmax(logits), axis=-1) CE = tf.nn.softmax_cross_entropy_with_logits(labels, logits) Poly1 = CE + epsilon * (1 - pt) return Poly1
Open Source Code Yes Our experiments are based on public datasets and open source code repositories, shown in footnote 3-6. [...] Code at https://github.com/tensorflow/tpu/tree/master/models/official/ [...] Code at https://github.com/google/automl/tree/master/efficientnetv2 [...] Code at https://github.com/tensorflow/lingvo/tree/master/lingvo/tasks/car
Open Datasets Yes On Image Net (Deng et al., 2009), our Poly Loss improves both pretraining and finetuning for the recent Efficient Net V2 (Tan & Le, 2021); on COCO (Lin et al., 2014), Poly Loss improves both 2D detection and segmentation AR for Mask-RCNN (He et al., 2017); on Waymo Open Dataset (WOD) (Sun et al., 2020), Poly Loss improves 3D detection AP for the widely used Point Pillars (Lang et al., 2019) and the very recent Range Sparse Net (RSN) (Sun et al., 2021).
Dataset Splits Yes We reserve 25,000 images from the training set as minival to search the optimal ϵ1.
Hardware Specification No The paper does not explicitly mention specific hardware specifications such as GPU or CPU models used for the experiments.
Software Dependencies No The paper mentions using TensorFlow, but does not specify exact version numbers for TensorFlow or other software libraries/dependencies.
Experiment Setup Yes We use Res Net-50 (He et al., 2016) and its training hyperparameters without modification. [...] For the following experiments, we adopt the default training hyperparameters in the public repositories without any tuning. [...] We set ϵ1 = 2 for both. [...] In training Mask R-CNN, we use the training schedule optimized for cross-entropy loss, and replace the cross-entropy loss with LPoly-1 = log(Pt) + ϵ1(1 Pt) for the classification loss Lcls, where ϵ1 ∈ {−1.0, −0.8, −0.6, −0.4, −0.2, 0, 0.5, 1.0}.