This post is the second in a series of posts about the “Fallacious Simplicity of Deep Learning”. I have seen too many comments from non-practitioner who thinks Machine Learning (ML) and Deep Learning (DL) are easy. That any computer programmer following a few hours of training should be able to tackle any problem because after all there are plenty of libraries nowadays… (or other such excuses). This series of posts is adapted from a presentation I will give at the Ericsson Business Area Digital Services technology day on December 5th. So, for my Ericsson fellows, if you happen to be in Kista that day, don’t hesitate to come see it!
In the last post, we’ve seen that the first complexity lay around the size of the machine learning and deep learning community. There are not enough skilled and knowledgeable peoples in the field. To illustrate the other complexities, I’ll show an example of deep learning using keras. Don’t worry if you are not used to it and even if you have not programmed in a while, or at all, I’ll keep it simple. Below is one of the software stack you can use in order to perform deep learning. This one can be deployed on both CPU, so your usual computer or computer server; but it can also be deployed on Graphic Processing Units or GPU. Basically, the video card in your computer.
To use the video card to do the type of computation required for deep learning, one of the GPU manufacturer, Nvidia, has created a software layer to program those GPU. CUDA is the Compute Unified Device Architecture, and allow someone to program the GPU to do any highly parallelizable task. On top of that layer, Nvidia still has created another layer targeting the task of running deep neural network. This is the cuDNN layer, or CUDA Deep Neural Network library. For my example, I’ll use on top of cuDNN the google framework for graph computation, Tensorflow. Lastly, to simplify my task, since I won’t build new kind of neurons or new kind of layers, I’ll use google Keras librairy which makes simpler the process of defining a deep neural network, deploying it, training it and testing it. For something simple, we already have 5 layers of librairies, and I don’t even mention the language I’ll use and the libraries required for it as well (note that in the latest release of TensorFlow, keras has been integrated). But no biggies, in software development we are used to have many layers of software piling up.
The software stack I’m using for this example is only one of the possible one to make use of. Just for the nVidia GPU there are already more than a dozen of frameworks that builds on top of cuDNN. Moreover, Intel, AMD and google are coming up with their deep neural network hardware accelerator. Many other companies are doing the same, creating accelerated hardware for deep neural networks. All this new hardware will come with their equivalent of CUDA and cuDNN and frameworks will proliferate for a while.
I’m not even going to talk about the next layer of frameworks (e.g. Tensorflow and keras). Hopefully, they will adapt to the new hardware… otherwise, we’ll have even more frameworks. Same for the next layer e.g. keras builds on top of tensorflow (or theano or CNTK but let’s not open that door now). Hence, we can see our next complexity.
Second complexity, the piling of frameworks (including specialized hardware) and the proliferation of frameworks. Which one to learn? Which one will become irrelevant?
The machine learning but especially the deep learning landscape is evolving rapidly. To be efficient it requires new kind of hardware that we did not see as common in industrial servers even a few years ago. This means that the whole development stack, from hardware to released data product is evolving quickly. Changing requirement is a known issue in software development, it is not different in data product development.
My next post will tackle through an example the next complexity: Hyper-parameter tuning, something you do not see in software development but which is necessary for the development of a data product.