Abstract:
In this work, we explore the redundancy of parameters in deep neural networks
by replacing the conventional linear projection in fully-connected layers with
the circulant projection. The circulant structure substantially reduces memory
footprint and enables the use of the Fast Fourier Transform to speed up the
computation. Considering a fully connected neural network layer with d input
nodes, and d output nodes, this method improves the time complexity from O(d^2)
to O(d log d) and space complexity from O(d^2) to O(d). The space savings are
particularly important for modern deep convolutional neural network
architectures, where fully-connected layers typically contain more than 90% of
the network parameters. We further show that the gradient computation and
optimization of the circulant projections can be performed very efficiently.
Our experiments on three standard datasets show that the proposed approach
achieves this significant gain in storage and efficiency with minimal increase
in error rate compared to neural networks with unstructured projections.
Biography:
Yu Cheng is a Research Staff Member at IBM T.J. Watson Research Center. Prior
to joining IBM, he obtained PhD in 2015 from computer science department,
Northwestern University. Before that, he received his Bachelor degree in 2010
from Tsinghua University. Yu's research interests are in the areas of machine
learning, and its applications in data mining and computer vision. At Watson,
he is focusing on: 1) developing machine learning algorithms for
spatio-temporal data analysis; 2) connecting healthcare, social and mobile
applications; 3) exploiting deep learning to solve real industrial problems.
Host:
Dr. Jiayu Zhou
|