For my Stanford Convolutional Neural Networks course, I partnered with a brilliant friend of mine to analyze images from a collection of 40,000 digitized works of art by classifying them according to artist, genre, and location. After some standard pre-processing, we employed a modified VGGNet architecture to achieve better than state-of-the-art results on artist and genre classification. Along the way though, we hit a number of roadblocks, and saw Error allocating 86118400 bytes of device memory (out of memory). Driver report 32735232 bytes free and 4294770688 bytes total. Segmentation fault (core dumped) more times than we would like to remember. In getting our network to run to properly, we encountered a number of problems and their solutions.
So if you’ve started studying RNNs, and you heard that LSTMs and GRUs at the type of RNNs you should use because vanilla RNNs suffer from the vanishing gradient problem. That makes sense because the hidden state is passed along for each iteration, so when back-propagating, the same Jacobian matrix is multiplied by itself over and over again. If that matrix has a principal eigenvalue less than one, then we have a vanishing gradient. Incidentally, if the matrix has a principal eigenvalue greater than one: exploding gradient.
A couple of days ago, I found out that I had achieved partial Internet fame! Unfortunately, it was for a negative reason due to a simple misunderstanding. A random person took a screenshot of a Twitter exchange I had with the founder of Keras, in which I mentioned being very proud of my high score on a MNIST toy competition.
Recent advances in Deep Learning have come about for a couple of key reasons, not the least of which is the enormous growth of available data. But if overwhelming amounts of data are required to reach peak performance, how does one actually go about managing the unmanageable? In other words, how does one get past the hype of Big Data in machine learning and into the practical implementation of leveraging Big Data for meaningful results?
Recently, you’ve read numerous HN articles telling you machine learning is the future. You’ve heard that Google uses it for search, Facebook uses it to detect faces, and Netflix uses it for recommendations. Your interest is sufficiently piqued. But you might still be left wondering, “How do I get started?”