So if you’ve started studying RNNs, and you heard that LSTMs and GRUs at the type of RNNs you should use because vanilla RNNs suffer from the vanishing gradient problem. That makes sense because the hidden state is passed along for each iteration, so when back-propagating, the same Jacobian matrix is multiplied by itself over and over again. If that matrix has a principal eigenvalue less than one, then we have a vanishing gradient. Incidentally, if the matrix has a principal eigenvalue greater than one: exploding gradient.
A couple of days ago, I found out that I had achieved partial Internet fame! Unfortunately, it was for a negative reason due to a simple misunderstanding. A random person took a screenshot of a Twitter exchange I had with the founder of Keras, in which I mentioned being very proud of my high score on a MNIST toy competition.
Recent advances in Deep Learning have come about for a couple of key reasons, not the least of which is the enormous growth of available data. But if overwhelming amounts of data are required to reach peak performance, how does one actually go about managing the unmanageable? In other words, how does one get past the hype of Big Data in machine learning and into the practical implementation of leveraging Big Data for meaningful results?
Recently, you’ve read numerous HN articles telling you machine learning is the future. You’ve heard that Google uses it for search, Facebook uses it to detect faces, and Netflix uses it for recommendations. Your interest is sufficiently piqued. But you might still be left wondering, “How do I get started?”
Many folks in the tech industry are skeptical about the hype of machine learning, since the world has been promised to them before, but the reality hasn’t really panned out. In fact this has happened so many times, there is a term for this in the machine learning community called an AI Winter. A number of fundamental factors are different though in 2015 that warrant taking a closer look this time around.