| PhD Seminar

Name of the Speaker: Ms.Nancy Nayak (EE17D408)
Guide: Prof. Sheetal Kalyani
Venue: ESB-244 (Seminar Hall)
Date/Time: 24th November 2023 (Friday), 11:00 AM
Title: Rotate the ReLU to Sparsify Deep Networks Implicitly


Compact and energy-efficient models have become essential in this era when deep learning-based solutions are widely used for various real-life tasks. In this paper, we propose rotating the ReLU activation to give an additional degree of freedom in conjunction with the appropriate initialization of the rotation. This combination leads to implicit sparsification without the use of a regularizer. We show that this rotated ReLU (RReLU) activation improves the representation capability of the parameters/filters in the network and eliminates those parameters/filters that are not crucial for the task, giving rise to significant savings in memory and computation. While the state-of-the-art regularization-based Network-Slimming method achieves 28.65% saving in memory and 25.1% saving in computation with ResNet-164, RReLU achieves a saving of 46.2% in memory and 35.77% in the computation without any loss in accuracy. The savings in memory and computation further increase by 51.5% and 40.58%, respectively, with the introduction of L1 regularization to the RReLU slopes}. We note that the slopes of the rotated ReLU activations act as coarse feature extractors and can eliminate unnecessary features before retraining. Our studies indicate that features always choose to pass through a lesser number of filters. We demonstrate the results with popular datasets such as MNIST, CIFAR-10, CIFAR-100, SVHN, and Imagenet with different architectures, including Vision Transformers. We also briefly study the impact of adversarial attacks on RReLU-based ResNets and observe that we get better adversarial accuracy for the architectures with RReLU than ReLU. We also demonstrate how this concept of rotation can be applied to the GELU activation function, commonly utilized in Transformer architectures. For the GELU-based multi-layer perceptron (MLP) part of the Transformer, we obtain 2.6% improvement in accuracy with 6.32% saving in both memory and computation.