AI market place is not what you are looking for (in the telecommunication industry).

In a far away land was the kingdom of Kadana. Kadana was a vast country with few inhabitants. The fact that in the warmest days of summer, temperature was seldom above -273°C was probably a reason for it. The land was cold, but people were warm.

In Kadana there was 3 major telecom operators: B311, Steven’s and Telkad. There were also 3 regional ones: Northlink, Southlink and Audiotron. Many neighboring kingdoms also had telecom operators, some a lot bigger than the ones in Kadana. Dollartel, Southtel, Purpletel, we’re all big players and many more competed in that environment.

It was a time of excitement. A new technology called AI was becoming popular in other fields and the telecommunications operators wanted to get the benefits as well. Before going further in our story, it can be of interest to understand a little bit what this AI technology is all about. Without going into too much details, let’s just say that traditionally if you wanted a computer to do something for you, you had to feed him a program handcrafted with passion by software developer. The AI promise was that from now on, you could feed a computer with a ton of data about what you want to be done and it would figure out the specific conditions and provide the proper output without (much) of programming. For those aware of AI this looks like an overly simplistic (if not outright false) summary of the technology, but let’s keep it that way for now…

Going back to the telecommunication world, somebody with nice ideas decided to create Akut05. Akut05 was a new product combining the idea of a marketplace with the technology of AI. Cool! The benefit of a market place as demonstrated by the Apple App Store or Google Play, combined with the power of AI.

This is so interesting, I too want to get into that party, and I immediately create my company, TheLoneNut.ai. So now I need to create a nice AI model that I could sell on the Akut05 marketplace platform.

Well, let not be so fast… You see, AI models are built from data as I said before. What data will I use? That’s just a small hurdle for TheLoneNut.ai company… we go out, talk with operators. Nobody knows TheLoneNut.ai, it’s a new company, so let’s start with local operators. B311, Steven’s and Telkad all think we are too small a player to give us access to their data. After all, their data is a treasure trove they should benefit from, why would they give us access to it. We then go to smaller regional players and Northlink has some interests. They are small and cannot invest massively in a data science team to build nice models, so with proper NDA, they agree to give us access to their data in counterpart, they will have access to our model on Akut05 with substantial rebate.

Good! We need to start somewhere. I’ll skip all the adventures along the way of getting the data, preparing it and building a model… but let me tell you that was full of adventures. We deploy a nice model in an Akut05 store and it works wonderfully… for awhile. After some time, the subscribers from Northlink change a bit their behavior, and Northlink see that our model does not respond properly anymore. How do they figure? I have no idea, since Akut05 does not provide with any real model monitoring capabilities besides the regular “cloud” monitoring metics. More alarming, we see 1-star reviews pouring in from B311, Steven’s and Telkad who tried our model and got from the get go poor results. And there is nothing we can do about it because after all we never got deals with those big names to access their data. A few weeks later, having discounted the model to Northlink and getting only bad press from all other operators, TheLoneNut.ai bankrupt and we never hear from it again. The same happens to a lot of other small model developers who tried their hand at it, and in no time the Akut05 store is empty of any valuable model.

So contrary to an App Store, a Model Store is generally a bad idea. To get a model right (assuming you can) you need data. This data needs to come from representative examples of what you want the model to apply to. But it easy, we just need all the operator to agree to share the data! Well, if you don’t see the irony, then good luck. But this is a nice story, lets put aside the irony. All the operators in our story decide to make their data available to any model developers on the Akut05 platform. What else could go wrong.

Let us think about a model that use the monthly payment a subscriber pays to the operator. In Kadana this amount is provided in the data pool as $KAD, and it works fine for all Kadanian operators. Dollartel tries it out and (not) surprisingly it fails miserably. You see, in the market of Dollartel, the money in use is not the $KAD, but some other currency… The model builder, even if he has data from Dollartel may have to do “local” adjustments. Can a model still provide good money to the model builder if the market is small and fractured i.e. needs special care being taken? Otherwise you’ll get 1-star review and again disappear after a short while.

Ok, so the Akut05 is not a good idea for independent model builders. Maybe it can still be used by Purpletel which is a big telecom operator which can hire a great number of data scientists. But in that case, if its their data scientist who will do the job, why would they share their data? If they don’t share their data and hire their own data scientists, why would they need a market place in the first place?

Independent model builders can’t find their worth from a model market place, operators can’t either… can the telecom manufacturer make money there? Well, why would it more valuable than for an independent model builder? Maybe it could get easier access to data, but the prerogatives are basically the same and it wouldn’t be a winning market either I bet.

Well, therefore a market place for AI is not what you are looking for… In a next post I’ll try to say a little bit about what you should be looking for in the telecom sector when it comes to AI.

For sure this story is an oversimplification of the issue, still, I think we can get the point. You have a different view? Please feel free to share it in the comments below so we can all learn from a nice discussion!


Cover photo by Ed Gregory at Pexels.

Advertisements

How to become a good data scientist

After being so vocal about how to be a bad data scientist, I thought I should even out the play field by giving some hints on how to become a good data scientist. The other side of the medal.

My strong feeling is that is you just start in the field for employment or salary reasons, you start on the wrong foot. You should first look at your passions. Here it is interesting to take a few seconds to lookup the word passion as defined on Dictionary.com:

passion

[pashuh n]

noun

  1. any powerful or compelling emotion or feeling, as love or hate.
  2. strong amorous feeling or desire; love; ardor.
  3. strong sexual desire; lust.
  4. an instance or experience of strong love or sexual desire.
  5. a person toward whom one feels strong love or sexual desire.
  6. strong or extravagant fondness, enthusiasm, or desire for anything: a passion for music.
  7. the object of such a fondness or desire:Accuracy became a passion with him.

Hopefully the scope of your passion for data science does not involve definitions 2, 3, 4 or 5. But is driven by a strong fondness and enthusiasm for data science! If so you are on the right track and my first advise would be: do not try to swallow the ocean in one sip. Zoom on one aspect of that passion, the one that piqued you interest first. See how you could apply it in a real-world problem and learn along the way. For example, in my case, I got passionate about artificial life long time ago. That evolved in becoming fond in a form of reinforcement learning, the genetic algorithms and genetic programming around 2012. As time passed, I grew my interests in machine learning and deep learning, learned about it by reading books, taking online courses and taking a graduate course while studying for my master’s degree. At that time, I had the hope to apply it to the project I had for my master thesis, but sometime plan changes. So, in short you need to follow your heart here.

If you go with such an approach, you will avoid many of the pitfall I mentioned in the first post. You won’t come to expect a “clean” data set as your input since you’ll have applied it to a few real case examples as you learned. You will learn along the way how to gather data, how to clean it, how to interpret it… it will benefit you in two ways. First you will learn one of the essential skills, data cleaning. But most importantly, it will grow your inquisitive mind. Something that I never seen a single course being able to do. Again, I do not think this is a skill you can get in a few weeks, it requires a mind shift that you will acquire through repeated practice.

Another benefit of going along your passion is that if you don’t already have the necessary mathematical background, you will grab it along the way. If you find maths hard, it is probably easier to grab them on a need basis as you expand your knowledge through your own passionate experiments! I will also re-iterate that nonetheless what you might think or have been told, mathematics is not so hard. Moreover, they are way easier to get if you start with a positive attitude, telling yourself that you can do it.

Next benefit of such an approach is that you will have to define and refine your problem. You will decide what is important to you, what is your “research” question and how it relates to the activities you are doing along the way. When I was doing my master’s degree, I saw two types of students. Those who already had a research agenda, a question they wanted to explore, or who at least sat down early with their advisor and set up such a research question inline with their interests and passions. Those students usually made high quality presentations, were following courses highly relevant to answer their research questions and became highly proficient in their field of research. The second type of student waited for their advisors to give them a research project, never were really involved in it, presented average or poor presentations, followed any courses without really seeing how they related to their research topic: well, in most cases they were not… and at the end were probably still graduating, but with a subject to forget about… You want to be like the first type of students, even if you do it on your own, you want to take control of it and reap the benefits.

Lastly, it is good for you to write or talk about your findings and learnings. Myself I found it help crystalize my thoughts and get (sometime) some feedback from other comparable minded peers. All to say that academic papers are not the only way to communicate your findings, blogs, videos, reports can all help you if you have the passion. Sure of advantage of an academic paper is the peer review system which provide you with feedback on your research, but you should not limit yourself to that single media of communication if it is not suited to your reality. Expose plainly what you found, do not claim you are something you are not, or not yet. When the time comes, other will recognize you as a data scientist and that day you will know you are one for sure!

In the same lines as my previous post, learn hard: it is easier when you are you are following a personal research/interest goal. Work hard: again, something easier (not necessarily easy) when you follow a passion. And at all time be honest with yourself (but also others) about what you know or found out. If you think of yourself as a full-grown data scientist on day one, you might not put in the work necessary to ever become one. On the other hand, if you follow your interests and passions, you might become a data scientist before you even think of yourself as one.


Cover photo by Magda Ehlers at Pexels.

How to be a bad data scientist!

So, you want to be a data scientist, or better you think you are now a data scientist and you are ready for your first job… Well make sure you are not one of the stereotypes of “wanna be data scientists” I list below, otherwise you may well go through numerous rejection in interviews. I do not claim it is a complete list of all the stereotypes out there. In fact, if you can think of other stereotypes, please share them in the comments! This is only a few stereotypes of peoples I have met or seen with time, and who sadly seems to repeat over and over again.

I want to be a data scientist [because of the money] where do I start?

This type of person has heard that there is good money to be made in data science and want its share of it… Little this type of person knows that a lot of hard work is involved in learning the knowledge and skills required to perform the job. Little also this type of persons know that data science is a constant work of research. Seldom is a clear path to the solution is in front of you. This is even truer with deep learning where new techniques and ideas pops every day and where you will have to come up with new ideas. If you need to post on a social media the question “where do I start?”, you don’t have what it takes to be one. Get a learn it all attitude, build an innovative spirit and then come back later.

I can do data science, please give me the “clean” data.

If you just came from (god forbid) a single data science course, or hopefully a few ones. And if you performed one or a few Kaggle like competition, you might be under the impression that data comes to you all cleaned up (or mostly ready) and with a couple of statements or commands it will all be well and ready for machine learning. The thing is that those courses and competitions prepare the data for you, so that you can go to the core of the problem faster and learn the subject matter of machine learning. In real life, data comes wild. It comes untamed and you must prepare it yourself. You might have to collect it yourself. A good part of most data scientists job is to play with the data, prepare it, clean it, etc. If you have not done this, figure out a problem of your own and solve it end-to-end and then come back later.

I don’t know any math or I’m bad at it, but people says I can do data science.

No, it is a fallacy. If you don’t have a mathematical mind, one day or the next you will end up in a situation where you just cannot progress anymore. The good thing is that you can learn mathematics. First, get out of the syndrome of: “this is too hard”. Anyway, data science is harder, so better start with something simple as mathematics. Learn some calculus, some statistics, learn to speak and think mathematics and then come back later.

Just give me a “well” defined problem.

Some people just want their little box with well defined interfaces, what comes in, what is expected to go out. Again, a syndrome of someone who just did some well canned coursed in the field… In reality, not only data is messy, but the problem you have to solve are messy, ill defined, muddy, … you have to figure it out. Sometimes you can define and refine it by yourself, sometimes you have to accept the messiness and play around with it. If you cannot be given vague and approximate objectives and refine them through thinking, research and discussions with the stakeholders until you come up with a solution, don’t expect be a data scientist. A big misconception here is that if you have a PhD you are immune to that problem… well not so fast, I have seen PhD struggling with this as much as any others. So, grow a spine, accept the challenge and then come back later.

I’ve learned data science, I have a blog/portfolio/… I can do anything.

Not so fast. This kind of person learned data science and being more marketing oriented and knowing it can help to build a personal brand built his portfolio or wrote blog, articles, etc. but never went to the point of trying it himself in real life. That person thinks he know it all and that he can solve anything. That type of person is probable singlehandedly responsible for the over-hype of what data science and machine learning can achieve and is more of a problem to the profession than of any help. Do some real work, grow some honesty and then come back later.

If you want to be a data scientist, it all boils down to a simple recipe. Learn hard and work hard. You must follow your path and put passion in it. Search to grow knowledge along your interests, learn about it, try things. Continuously learn new things, and not only on connected subjects. Do not limit yourself to courses, find real world examples to practice on, stay honest about what you can do, about what you know and do not know. Be a good human!


Cover image by tookapic at Pixabay.

The Fallacious Simplicity of Deep Learning: zat is ze question?

This post is the fifth and last in a series of posts about the “Fallacious Simplicity of Deep Learning”. I have seen too many comments from non-practitioner who thinks Machine Learning (ML) and Deep Learning (DL) are easy. That any computer programmer following a few hours of training should be able to tackle any problem because after all there are plenty of libraries nowadays… (or other such excuses). This series of posts is adapted from a presentation I will give at the Ericsson Business Area Digital Services technology day on December 5th. So, for my Ericsson fellows, if you happen to be in Kista that day, don’t hesitate to come see it!

In the last posts, we’ve seen that the first complexity lay around the size of the machine learning and deep learning community. There are not enough skilled and knowledgeable peoples in the field. The second complexity lay in the fact that the technology is relatively new, thus the frameworks are quickly evolving and requires software stacks that range from all the way down to the specialized hardware we use. The third complexity was all about hyper-parameter setting, a skill specific to machine learning and that you will need to acquire. The fourth complexity dealt with data, how to obtain it, how to clean it, how to tame it.

The next challenge we will look at is ze question. When we start a new machine learning project, we might have a question we want to answer. Through data exploration we might find that this question cannot be answered directly with the data we have. Maybe we figure the question is not that interesting after all. It all boils down to fast feedback. You need to explore your data, try to answer a question and see where it leads. Then it is time for discussion with the stakeholders, is it what they are looking for? Does it bring value?

There is different type of questions machine learning can answer, but it is not unlimited. Do we want to sort things in similar buckets, do we want to predict such and such value, do we want to find examples that are abnormal? In order for a machine learning exercise to be successful, you need a really specific question to answer: Is this subscriber an IoT device or a Human? Then you need proper data. Using the Canadian census data to try to answer to figure out if a mobile phone subscriber is a Human or a Machine might not work! But with the proper data we could start to explore if there is a model between some data, for example IP addresses visited, time of those visits, etc. and the fact the subscriber is a Machine or a Human.

Often the question will evolve with time, through discussion. You need to be ready for that evolution, for that change. You might have to bring in new data, new methods, new algorithms. It is all about searching and researching, trials and errors, finding new paths. Getting access to the data might be the biggest difficulty, but finding the right question is certainly the second.

The fifth complexity: finding the right question.

Through this series I have detailed five complexities to deep learning (and machine learning in general). There are many more. The machine learning “process” is not like software development. In general, it requires a lot more exploration and researching than regular software development. It requires an higher level of “artistic flair” than you would need to write a regular software application. There are other things that differentiate machine learning from software development, but I think those are the five first and biggest complexity one can face when developing a machine learning model:

  • First you need access to data, and that might not be trivial. You will also need to clean that data and ensure you have a consistent flow.
  • Second you will need to find the right question. This may require many iterations and might require new data sources as well.
  • Third, the results may look simple, but the code itself does not show everything that is hidden behind the curtain. A lot has to do with hyper-parameter setting and tweaking and there is no cookbook for this. The APIs do not tell you what values will give good results.
  • Fourth, machine learning requires specific competence. Some of those competences have to do with software development, but some other are quite different. It is a relatively new domain and the community size is still smaller than others in software development. Moreover, this is a highly in demand skill set and hard to find in the wild.
  • Finally, this is a quickly evolving domain which range from specialized hardware to specialized software stacks. In software development peoples are accustomed to quickly evolving environment, but the pace and breath at which it goes in machine learning might well be unprecedented.

I hope this series could give you a better appreciation of the complexities of deep learning and machine learning. It may look easy when you look at the results and the code that supports it, but there is a lot of things you are not given to see! Artificial Intelligence field will continue to grow in the coming years and will enter in way more aspect of your daily lives. This will require peoples trained in deep learning, machine learning and data science as this is not simply the usage of yet another software library.


Originally published at medium.com/@TheLoneNut on November 28, 2017.

The Fallacious Simplicity of Deep Learning: wild data

This post is the fourth in a series of posts about the “Fallacious Simplicity of Deep Learning”. I have seen too many comments from non-practitioner who thinks Machine Learning (ML) and Deep Learning (DL) are easy. That any computer programmer following a few hours of training should be able to tackle any problem because after all there are plenty of libraries nowadays… (or other such excuses). This series of posts is adapted from a presentation I will give at the Ericsson Business Area Digital Services technology day on December 5th. So, for my Ericsson fellows, if you happen to be in Kista that day, don’t hesitate to come see it!

In the last posts, we’ve seen that the first complexity lay around the size of the machine learning and deep learning community. There are not enough skilled and knowledgeable peoples in the field. The second complexity lay in the fact that the technology is relatively new, thus the frameworks are quickly evolving and requires software stacks that range from all the way down to the specialized hardware we use. The third complexity was all about hyper-parameter setting, a skill specific to machine learning and that you will need to acquire.

Next challenge with machine learning, or should I say first challenge is the data! You need data to perform a machine learning task. For deep learning, you arguably need even more said data.

This is an angle that you will not see if you take courses (online or in schools) about data science, machine learning or deep learning. At least I’ve never seen it being properly presented as a difficulty. When you take a course, the data is most of the time provided. Or you get a clear indication as where to obtain it from. In most case this is well curated data. Well explained and documented fields and formats. If the data is not already available and the exercise wants you to concentrate on the data scraping aspect, then the other parameters of the exercise will be well defined. They will tell you where to scrape the data from and what is of importance for the exercise. It will make sure the scraping can be done in a structure fashion. If the exercise wants you to concentrate on data cleaning, or data augmentation, the dataset will have been prepared to properly show you how this can be done. Not to say that those courses or exercises are easy, but they do not show the real difficulty of wild data.

I think that for most companies, it all started with wild data. As data science grows in a company, peoples put structure and pipelines around data. But this comes with time and size. And it certainly does not prevent the entry of some wild data beast in the zoo from time to time. So, assuming you have access to data, it might really be wild data and you will have to tame it. Some says deep learning is less about data engineering and more about model architecture creation. It is certainly true in computer vision where the format of the data has been agreed on some time ago. But what about another domain, is the format so widely accepted? You might still have to do some feature engineering to input your data in your model. Adding the feature engineering problem to the hyper-parameter tuning problem…

On the other hand, you might well be in a situation where you do not have access to data. What if you are used to sell a specific type of equipment. That equipment might not be connected to a communication network. If it is, the customer might never have been asked if you could use its data. How do you put in place a contract or license that allows for you to collect data? Does the legislation in the regions where you sell that equipment allows for that collection or is there any restrictions you need to dance with? Is there personally identifiable information you will need to deal with? Will you need to anonymize the data? If you start a new business, based on a new business model, you might be able to build it in your product. But if you have a continuing business, how to you incorporate it in your product? Does it need to be gradual?

I guess you now have a good understanding of our fourth complexity. Where do you get your data from?

Data might be hard to come by. It might be wild and messy. It might not even relate to the question you want to answer… and that will be the subject of our fifth complication: do you have the right question?

The Fallacious Simplicity of Deep Learning: hyper-parameters tuning

This post is the third in a series of posts about the “Fallacious Simplicity of Deep Learning”. I have seen too many comments from non-practitioner who thinks Machine Learning (ML) and Deep Learning (DL) are easy. That any computer programmer following a few hours of training should be able to tackle any problem because after all there are plenty of libraries nowadays… (or other such excuses). This series of posts is adapted from a presentation I will give at the Ericsson Business Area Digital Services technology day on December 5th. So, for my Ericsson fellows, if you happen to be in Kista that day, don’t hesitate to come see it!

In the last posts, we’ve seen that the first complexity lay around the size of the machine learning and deep learning community. There are not enough skilled and knowledgeable peoples in the field. The second complexity lay in the fact that the technology is relatively new, thus the frameworks are quickly evolving and requires software stacks that range from all the way down to the specialized hardware we use. We also said that to illustrate the complexities we would show an example of deep learning using keras. I have described the model I use in a previous post This is not me blogging!. The model can generate new blog post looking like mine from being trained on all my previous posts. So without any further ado, here is the short code example we will use.

 

KerasExample
A Keras code example.

In these few lines of code you can see the gist of how one would program a text generating neural network such as the one pictured besides the code. There is more code required to prepare the data and generate text from model predictions, than simply the call to model.predict. But the part of the code which related to create, train and make predictions with a deep neural network is all in those few lines.

You can easily see the definition of each layers: the embedding in green, the two Long Short Term Memory (LSTM) layers, a form of Recurrent Neural Network, here in blue. And a fully connected dense layer in orange. You can see that the inputs are passed when we train the model or fit it, in yellow as well as the expected output, our labels in orange. Here that label is given the beginning of a sentence, what would be the next character in the sequence. The subject of our prediction. Once you trained that network, you can ask for the next character, and the next, and the next… until you have a new blog post… more or less, as you have seen in a previous post.

For people with programming background, there is nothing complicated here. You have a library, Keras, you look at its API and you code accordingly, right? Well, how do you choose which layers to use and their order? The API will not tell you that… there is no cookbook. So, the selection of layers is part our next complexity. But before stating it as such let me introduce a piece of terminology: Hyper-parameters. Hyper-parameters are to deep learning and machine learning any parameter for which value you can vary, but ultimately have to finetune to your data if you want you model to behave properly.

So according to that definition of hyper-parameter, the deep neural network topology or architecture is an hyper-parameter. You must decide which layer to use and in what order. Hyper-parameter selection does not stops at the neural network topology though. Each layer has its own set of hyper parameters.

The first layer is an embedding layer. It converts in this case character input into a vector of real numbers, after all, computers can only work with numbers. How big this encoding vector will be? How long the sentences we train with will be? Those are all hyper-parameters.

On the LSTM layers, how wide or how many neurons will we use? Will we use all the outputs all the time or drop some of them (a technique called dropout which help regularizing neural network and reduce cases of overfitting)? Overfitting is when a neural network learns so well your training examples that it cannot generalize to new examples. Meaning that when you try to predict on a new value, the results are erratic. Not a situation you desire.

You have hyper-parameter to select and tweak up until the model compilation time and the model training (fit) time. How big the tweaking to your neural network weights will be at each computation pass? How big each pass will be in terms of examples you give to the neural network? How many passes will you perform?

If you take all of this into consideration, you end up with most of the code written being subject to hyper-parameters selection. And again, there is no cookbook or recipe yet to tell you how to set them. The API tells you how to enter those values in the framework, but cannot tell you what the effect will be. And the effect will be different for each problem.

It’s a little bit like if what you would give as argument to a print statement, you know like print(“hello world!”) would not be “hello word”, but some values which would print something based on that value (the hyper-parameter) and whatever has been printed in the past and you would have to tweak it so that at training time you get the expected results!!! This would make any good programmer become insane. But currently there is no other way with deep neural networks.

HelloWorldHyperParam2
Hello World with Hyper-Parameters.

So our fourth complexity is not only the selection of the neural network topology but as well as all the hyper-parameters that comes with it.

It sometimes requires insight on the mathematics behind the neural net, as well as imagination, lots of trials and error while being rigorous about what you try. This definitely does not follow the “normal” software development experience. As said in my previous post, you need a special crowd to perform machine learning or deep learning.

My next post will look at what is sometime the biggest difficulty for machine learning: obtaining the right data. Before anything can really be done, you need data. This is not always a trivial task…

 

The Fallacious Simplicity of Deep Learning: the proliferation of frameworks.

This post is the second in a series of posts about the “Fallacious Simplicity of Deep Learning”. I have seen too many comments from non-practitioner who thinks Machine Learning (ML) and Deep Learning (DL) are easy. That any computer programmer following a few hours of training should be able to tackle any problem because after all there are plenty of libraries nowadays… (or other such excuses). This series of posts is adapted from a presentation I will give at the Ericsson Business Area Digital Services technology day on December 5th. So, for my Ericsson fellows, if you happen to be in Kista that day, don’t hesitate to come see it!

In the last post, we’ve seen that the first complexity lay around the size of the machine learning and deep learning community. There are not enough skilled and knowledgeable peoples in the field. To illustrate the other complexities, I’ll show an example of deep learning using keras. Don’t worry if you are not used to it and even if you have not programmed in a while, or at all, I’ll keep it simple. Below is one of the software stack you can use in order to perform deep learning. This one can be deployed on both CPU, so your usual computer or computer server; but it can also be deployed on Graphic Processing Units or GPU. Basically, the video card in your computer.

kerasstack
The stack we will use for demonstration.

To use the video card to do the type of computation required for deep learning, one of the GPU manufacturer, Nvidia, has created a software layer to program those GPU. CUDA is the Compute Unified Device Architecture, and allow someone to program the GPU to do any highly parallelizable task. On top of that layer, Nvidia still has created another layer targeting the task of running deep neural network. This is the cuDNN layer, or CUDA Deep Neural Network library. For my example, I’ll use on top of cuDNN the google framework for graph computation, Tensorflow. Lastly, to simplify my task, since I won’t build new kind of neurons or new kind of layers, I’ll use google Keras librairy which makes simpler the process of defining a deep neural network, deploying it, training it and testing it. For something simple, we already have 5 layers of librairies, and I don’t even mention the language I’ll use and the libraries required for it as well (note that in the latest release of TensorFlow, keras has been integrated). But no biggies, in software development we are used to have many layers of software piling up.

The software stack I’m using for this example is only one of the possible one to make use of. Just for the nVidia GPU there are already more than a dozen of frameworks that builds on top of cuDNN. Moreover, Intel, AMD and google are coming up with their deep neural network hardware accelerator. Many other companies are doing the same, creating accelerated hardware for deep neural networks. All this new hardware will come with their equivalent of CUDA and cuDNN and frameworks will proliferate for a while.

cuDNNAccelereted
Some of the cuDNN accelerated frameworks.

I’m not even going to talk about the next layer of frameworks (e.g. Tensorflow and keras). Hopefully, they will adapt to the new hardware… otherwise, we’ll have even more frameworks. Same for the next layer e.g. keras builds on top of tensorflow (or theano or CNTK but let’s not open that door now). Hence, we can see our next complexity.

Second complexity, the piling of frameworks (including specialized hardware) and the proliferation of frameworks. Which one to learn? Which one will become irrelevant?

The machine learning but especially the deep learning landscape is evolving rapidly. To be efficient it requires new kind of hardware that we did not see as common in industrial servers even a few years ago. This means that the whole development stack, from hardware to released data product is evolving quickly. Changing requirement is a known issue in software development, it is not different in data product development.

My next post will tackle through an example the next complexity: Hyper-parameter tuning, something you do not see in software development but which is necessary for the development of a data product.