A Reduced Order Approach for Artificial Neural Networks applied to Object Recognition
Please login to view abstract download link
Computer Vision is a thriving field increasingly exploited in several scientific and engineering contexts in order to solve complex tasks such as the recognition and detection of objects inside pictures. A possible approach to deal with image processing problems is represented by Deep Neural Networks, such as Convolutional Neural Networks (CNNs). Such architectures well perform on complex tasks as object recognition, but may require a high number of layers to extract all the features of the problem at hand, leading to an increasing number of parameters to be calibrated during the training phase. This naturally opened several computational issues in the learning procedure, as well as in the memory and space required by the model itself, especially in the case these networks have to operate in an embedded system with limited hardware. A possible solution for the aforementioned problem is represented by the development of a dimensionality reduction technique for CNNs by employing Proper Orthogonal Decomposition (POD), a method widely used in the context of Reduced Order Modeling, or Higher Order SVD (HOSVD), to keep into account the intrinsic tensorial structure. The reduced network is obtained by splitting the original one in two different nets connected by the reduction technique: the first one obtained by retaining a certain number of layers of the original model and a second one that deals with the classification of the features extracted by the previous part. Hence, in our works we propose several version of reduced networks by combining those techniques to tackle two different problems: image recognition and object detection. For the first case, we provide the numerical results obtained by applying such method to a benchmark CNN for the problem of image recognition, VGG-16, using the CIFAR-10 and a custom dataset. In particular we compare the final outcome of the original net with that of its reduced version in terms of final accuracy, memory allocation, speed of the procedure. For the object detection case, we present a possible generalization of the method proposed for Artificial Neural Networks to object detectors and in particular to SSD-300 or neural networks with a similar architecture. We thus provide the results obtained by constructing a reduced SSD, that is characterized by a decrease in the accuracy of the net but with a great reduction in memory allocation and half the training time with respect to the full version.