IMA MAG AGE

A small pause in my series on the complexities of Deep Learning (or in general Machine Learning). Today I will try to give an honest review of the latest course I followed: “Convolutional Neural Networks” on Coursera by Andrew Ng.

I first followed the “Machine Learning” course from Andrew Ng in 2014. I really liked his pace of delivery and the way he builds up the knowledge to bring you up to speed. So, when his new nano degree on Deep Learning has been released in August of this year, I was one of the first to jump on it and I swiftly completed the three available courses. A couple of weeks back, the fourth out of five courses was released. I immediately went on the “Convolutional Neural Networks” course to continue my journey.

The course is well structured, as expected and given at a proper pace. The content is split on four weeks. The first week builds the foundations of Convolutional Neural Networks (CNN) and explains how those convolutions are computed, the mechanics. It explains its foundations in computer vision, then will detail the convolution with the padding, the stride and the pooling layers (max pooling, average pooling).

The second week looks at a few “classic” CNNs and explains how the architecture was built by adding new concepts on top of the previously existing ones. It then goes on explaining ResNets (a concept which can apply to other networks than only CNNs) and then build toward Inception Networks (yes, Inception as in the movie)

Third week introduces two new practical concepts, object localization and object detection. Figuring out where in a picture an object is, and secondly how many objects we can detect in a picture and where they each are. It nicely shows how bounding box predictions are made and evaluated followed by the use of Anchor boxes.

Finally, in the fourth week you will learn some of the coolest and funniest things about CNNs: face recognition and neural style transfer. Some important concepts introduced there are the one shot learning (another of those things which can apply to other networks than CNNs) and Siamese networks.

All in all, this is a good course. There are some glitches along the way. Some of the videos are not as polished as they could be i.e. some “bloopers” needs removal and a few errors which needs correction on the slides, but very minor. My biggest gripes is about the programming assignments. First, they are not self-contained. They right from the start refer to the introduction to Tensorflow made in the 3rd week of the second course of the nano degree: “Improving deep neural networks”. This makes things a little bit awkward when you do not use Tensorflow day to day (usually I’m fine with Keras) and there is 2–3 months in between the courses…

Second, the assignments are a bit less instructed than the other courses in the nano degree i.e. you will need to spend more time figuring out small bits of programming where not are not as much taken by the hand as in previous courses. This is not so much a problem as a statement, however it becomes a problem when combined with the problematic Coursera assignment submission engine (well, honestly, I do not know if it is Coursera fault or the course fault, but the result is the same). Sometimes it will refuse to correctly grade your assignment for things which are not even errors, or will introduce artificial boundaries without telling you those are there… I hope those get resolved soon, as it will not deter early adopters as myself, but will most probably discourage more than one in the future.

Lastly the jupyter kernels used on the site are troublesome. Server availability seems sketchy at times. You will often lose your work even if you saved it (you should save/export to your local machine regularly to alleviate those issues). In short, it is still a far way from the Kaggle kernels handling. The same issues have been reported by a colleague of mine following another Coursera course, so this is not unique to the CNN course. Also as the access to Coursera is now subscription based, you will lose access to your kernels if you do not renew your subscription after the course. Hence if you do not want to lose your work (as it is as much a reference as the course video themselves) you will have to store them locally on your machine or preferred cloud storage.

All that being said, it is an excellent course and taught me many things which I did not knew before, so an overall 4/5! But be ready for the glitches especially in the programming assignments. I especially enjoyed the one shot learning which I intend to apply to one of my deep neural network problem at work (not CNN related) and the neural style transfer. In that last programming exercise you complete the code of a neural style transfer algorithm. With a little push from my good friend Marc-Olivier, I went a little bit further and implemented a multi-style transfer algorithm. Here are my youngest kids with different level of style transfer from Edvard MunchPablo PicassoVincent Van Gogh and Georges Braque.

ZackAbby4Styles
My youngest kids with style transferred left to right, top to bottom: Edvard Munch, Pablo Picasso, Vincent Van Gogh and Georges Braques.

Originally published at medium.com/@TheLoneNut on November 22, 2017.

Advertisements

The Fallacious Simplicity of Deep Learning: wild data

This post is the fourth in a series of posts about the “Fallacious Simplicity of Deep Learning”. I have seen too many comments from non-practitioner who thinks Machine Learning (ML) and Deep Learning (DL) are easy. That any computer programmer following a few hours of training should be able to tackle any problem because after all there are plenty of libraries nowadays… (or other such excuses). This series of posts is adapted from a presentation I will give at the Ericsson Business Area Digital Services technology day on December 5th. So, for my Ericsson fellows, if you happen to be in Kista that day, don’t hesitate to come see it!

In the last posts, we’ve seen that the first complexity lay around the size of the machine learning and deep learning community. There are not enough skilled and knowledgeable peoples in the field. The second complexity lay in the fact that the technology is relatively new, thus the frameworks are quickly evolving and requires software stacks that range from all the way down to the specialized hardware we use. The third complexity was all about hyper-parameter setting, a skill specific to machine learning and that you will need to acquire.

Next challenge with machine learning, or should I say first challenge is the data! You need data to perform a machine learning task. For deep learning, you arguably need even more said data.

This is an angle that you will not see if you take courses (online or in schools) about data science, machine learning or deep learning. At least I’ve never seen it being properly presented as a difficulty. When you take a course, the data is most of the time provided. Or you get a clear indication as where to obtain it from. In most case this is well curated data. Well explained and documented fields and formats. If the data is not already available and the exercise wants you to concentrate on the data scraping aspect, then the other parameters of the exercise will be well defined. They will tell you where to scrape the data from and what is of importance for the exercise. It will make sure the scraping can be done in a structure fashion. If the exercise wants you to concentrate on data cleaning, or data augmentation, the dataset will have been prepared to properly show you how this can be done. Not to say that those courses or exercises are easy, but they do not show the real difficulty of wild data.

I think that for most companies, it all started with wild data. As data science grows in a company, peoples put structure and pipelines around data. But this comes with time and size. And it certainly does not prevent the entry of some wild data beast in the zoo from time to time. So, assuming you have access to data, it might really be wild data and you will have to tame it. Some says deep learning is less about data engineering and more about model architecture creation. It is certainly true in computer vision where the format of the data has been agreed on some time ago. But what about another domain, is the format so widely accepted? You might still have to do some feature engineering to input your data in your model. Adding the feature engineering problem to the hyper-parameter tuning problem…

On the other hand, you might well be in a situation where you do not have access to data. What if you are used to sell a specific type of equipment. That equipment might not be connected to a communication network. If it is, the customer might never have been asked if you could use its data. How do you put in place a contract or license that allows for you to collect data? Does the legislation in the regions where you sell that equipment allows for that collection or is there any restrictions you need to dance with? Is there personally identifiable information you will need to deal with? Will you need to anonymize the data? If you start a new business, based on a new business model, you might be able to build it in your product. But if you have a continuing business, how to you incorporate it in your product? Does it need to be gradual?

I guess you now have a good understanding of our fourth complexity. Where do you get your data from?

Data might be hard to come by. It might be wild and messy. It might not even relate to the question you want to answer… and that will be the subject of our fifth complication: do you have the right question?

The Fallacious Simplicity of Deep Learning: hyper-parameters tuning

This post is the third in a series of posts about the “Fallacious Simplicity of Deep Learning”. I have seen too many comments from non-practitioner who thinks Machine Learning (ML) and Deep Learning (DL) are easy. That any computer programmer following a few hours of training should be able to tackle any problem because after all there are plenty of libraries nowadays… (or other such excuses). This series of posts is adapted from a presentation I will give at the Ericsson Business Area Digital Services technology day on December 5th. So, for my Ericsson fellows, if you happen to be in Kista that day, don’t hesitate to come see it!

In the last posts, we’ve seen that the first complexity lay around the size of the machine learning and deep learning community. There are not enough skilled and knowledgeable peoples in the field. The second complexity lay in the fact that the technology is relatively new, thus the frameworks are quickly evolving and requires software stacks that range from all the way down to the specialized hardware we use. We also said that to illustrate the complexities we would show an example of deep learning using keras. I have described the model I use in a previous post This is not me blogging!. The model can generate new blog post looking like mine from being trained on all my previous posts. So without any further ado, here is the short code example we will use.

 

KerasExample
A Keras code example.

In these few lines of code you can see the gist of how one would program a text generating neural network such as the one pictured besides the code. There is more code required to prepare the data and generate text from model predictions, than simply the call to model.predict. But the part of the code which related to create, train and make predictions with a deep neural network is all in those few lines.

You can easily see the definition of each layers: the embedding in green, the two Long Short Term Memory (LSTM) layers, a form of Recurrent Neural Network, here in blue. And a fully connected dense layer in orange. You can see that the inputs are passed when we train the model or fit it, in yellow as well as the expected output, our labels in orange. Here that label is given the beginning of a sentence, what would be the next character in the sequence. The subject of our prediction. Once you trained that network, you can ask for the next character, and the next, and the next… until you have a new blog post… more or less, as you have seen in a previous post.

For people with programming background, there is nothing complicated here. You have a library, Keras, you look at its API and you code accordingly, right? Well, how do you choose which layers to use and their order? The API will not tell you that… there is no cookbook. So, the selection of layers is part our next complexity. But before stating it as such let me introduce a piece of terminology: Hyper-parameters. Hyper-parameters are to deep learning and machine learning any parameter for which value you can vary, but ultimately have to finetune to your data if you want you model to behave properly.

So according to that definition of hyper-parameter, the deep neural network topology or architecture is an hyper-parameter. You must decide which layer to use and in what order. Hyper-parameter selection does not stops at the neural network topology though. Each layer has its own set of hyper parameters.

The first layer is an embedding layer. It converts in this case character input into a vector of real numbers, after all, computers can only work with numbers. How big this encoding vector will be? How long the sentences we train with will be? Those are all hyper-parameters.

On the LSTM layers, how wide or how many neurons will we use? Will we use all the outputs all the time or drop some of them (a technique called dropout which help regularizing neural network and reduce cases of overfitting)? Overfitting is when a neural network learns so well your training examples that it cannot generalize to new examples. Meaning that when you try to predict on a new value, the results are erratic. Not a situation you desire.

You have hyper-parameter to select and tweak up until the model compilation time and the model training (fit) time. How big the tweaking to your neural network weights will be at each computation pass? How big each pass will be in terms of examples you give to the neural network? How many passes will you perform?

If you take all of this into consideration, you end up with most of the code written being subject to hyper-parameters selection. And again, there is no cookbook or recipe yet to tell you how to set them. The API tells you how to enter those values in the framework, but cannot tell you what the effect will be. And the effect will be different for each problem.

It’s a little bit like if what you would give as argument to a print statement, you know like print(“hello world!”) would not be “hello word”, but some values which would print something based on that value (the hyper-parameter) and whatever has been printed in the past and you would have to tweak it so that at training time you get the expected results!!! This would make any good programmer become insane. But currently there is no other way with deep neural networks.

HelloWorldHyperParam2
Hello World with Hyper-Parameters.

So our fourth complexity is not only the selection of the neural network topology but as well as all the hyper-parameters that comes with it.

It sometimes requires insight on the mathematics behind the neural net, as well as imagination, lots of trials and error while being rigorous about what you try. This definitely does not follow the “normal” software development experience. As said in my previous post, you need a special crowd to perform machine learning or deep learning.

My next post will look at what is sometime the biggest difficulty for machine learning: obtaining the right data. Before anything can really be done, you need data. This is not always a trivial task…

 

The Fallacious Simplicity of Deep Learning: the proliferation of frameworks.

This post is the second in a series of posts about the “Fallacious Simplicity of Deep Learning”. I have seen too many comments from non-practitioner who thinks Machine Learning (ML) and Deep Learning (DL) are easy. That any computer programmer following a few hours of training should be able to tackle any problem because after all there are plenty of libraries nowadays… (or other such excuses). This series of posts is adapted from a presentation I will give at the Ericsson Business Area Digital Services technology day on December 5th. So, for my Ericsson fellows, if you happen to be in Kista that day, don’t hesitate to come see it!

In the last post, we’ve seen that the first complexity lay around the size of the machine learning and deep learning community. There are not enough skilled and knowledgeable peoples in the field. To illustrate the other complexities, I’ll show an example of deep learning using keras. Don’t worry if you are not used to it and even if you have not programmed in a while, or at all, I’ll keep it simple. Below is one of the software stack you can use in order to perform deep learning. This one can be deployed on both CPU, so your usual computer or computer server; but it can also be deployed on Graphic Processing Units or GPU. Basically, the video card in your computer.

kerasstack
The stack we will use for demonstration.

To use the video card to do the type of computation required for deep learning, one of the GPU manufacturer, Nvidia, has created a software layer to program those GPU. CUDA is the Compute Unified Device Architecture, and allow someone to program the GPU to do any highly parallelizable task. On top of that layer, Nvidia still has created another layer targeting the task of running deep neural network. This is the cuDNN layer, or CUDA Deep Neural Network library. For my example, I’ll use on top of cuDNN the google framework for graph computation, Tensorflow. Lastly, to simplify my task, since I won’t build new kind of neurons or new kind of layers, I’ll use google Keras librairy which makes simpler the process of defining a deep neural network, deploying it, training it and testing it. For something simple, we already have 5 layers of librairies, and I don’t even mention the language I’ll use and the libraries required for it as well (note that in the latest release of TensorFlow, keras has been integrated). But no biggies, in software development we are used to have many layers of software piling up.

The software stack I’m using for this example is only one of the possible one to make use of. Just for the nVidia GPU there are already more than a dozen of frameworks that builds on top of cuDNN. Moreover, Intel, AMD and google are coming up with their deep neural network hardware accelerator. Many other companies are doing the same, creating accelerated hardware for deep neural networks. All this new hardware will come with their equivalent of CUDA and cuDNN and frameworks will proliferate for a while.

cuDNNAccelereted
Some of the cuDNN accelerated frameworks.

I’m not even going to talk about the next layer of frameworks (e.g. Tensorflow and keras). Hopefully, they will adapt to the new hardware… otherwise, we’ll have even more frameworks. Same for the next layer e.g. keras builds on top of tensorflow (or theano or CNTK but let’s not open that door now). Hence, we can see our next complexity.

Second complexity, the piling of frameworks (including specialized hardware) and the proliferation of frameworks. Which one to learn? Which one will become irrelevant?

The machine learning but especially the deep learning landscape is evolving rapidly. To be efficient it requires new kind of hardware that we did not see as common in industrial servers even a few years ago. This means that the whole development stack, from hardware to released data product is evolving quickly. Changing requirement is a known issue in software development, it is not different in data product development.

My next post will tackle through an example the next complexity: Hyper-parameter tuning, something you do not see in software development but which is necessary for the development of a data product.

The Fallacious Simplicity of Deep Learning: the lack of skills.

This post is the first in a series of posts about the “Fallacious Simplicity of Deep Learning”. I have seen too many comments from non-practitioner who thinks Machine Learning (ML) and Deep Learning (DL) are easy. That any computer programmer following a few hours of training should be able to tackle any problem because after all there are plenty of libraries nowadays… (or other such excuses). This series of posts is adapted from a presentation I will give at the Ericsson Business Area Digital Services technology day on December 5th. So, for my Ericsson fellows, if you happen to be in Kista that day, don’t hesitate to come see it!

Artificial Intelligence (AI) is currently touching our lives, it is present on your phone, your personal assistants, your thermostats, pretty much all web sites you visit. It helps you choose what you will watch, what you will listen to, what you will read, what you will purchase and so on. AI is becoming a cornerstone of user experience.

Today, if you look at AI cutting edge technology, you must conclude that you can no longer trust what you see and hear. Some neural network techniques allow researchers to impersonate anybody in video, saying whatever they want with the right intonations, the right visual cues, etc. Neural networks are creating art piece, for example applying the style of great paint masters to any of your photography.

Soon, it will become even more relevant to your everyday life. We can already see looming the days of the autonomous cars. Eventually it will be all transportation, then regulation of all technology aspect of our life and even further…

The Artificial Intelligence technology reach growth is so fast and so transformational that we sometime have the impression that it must be all easy to apply. But is it so?

AI may look easy if you look at all the available resources, all the books available. The plethora of online courses, you cannot visit a web page nowadays without getting a machine learning course being proposed to you! There are tons of video available online. From those courses, but also from enthusiasts, or university teachers. And if you start digging in the available software frameworks, you’ll find plenty of them.

So why would someone like Andrew Ng, one of the world’s best-known AI expert would come forth with the mission of training a million AI experts? Well, Machine Learning, Deep Learning and Artificial Intelligence knowledge is still sparse. Big companies have grabbed a lot of the talented peoples leaving universities empty. A lot of Universities still don’t have programs dedicated to that topic. For those who have courses, most only propose a couple of introductory courses. Then from the online front, there will be countless number of people who will start to learn but will abandon it along the way for many reasons. Online course completion rate is much lower than University program completion rate.

Moreover, this is quite a difficult subject matter. There are many considerations which are quite different from what you would have from a software development background.

A first complexity, not enough skilled and knowledgeable peoples, a smaller community than say: web programming.

Stay tuned for my next post which will tackle the next complexity: the piling and proliferation of frameworks.

A battle between Man and Machine

Last time I showed you how a quite simple character based recurrent neural network (RNN) can be used as a generative model for text examples. I used for that purpose all my blog posts and asked it to generate a new post based on that. Easy to see that this is a toy example for the use of RNN.

This time around I want to show you how an RNN can be used for binary classification with quite good results. The binary classification problem we will look at is of a time series of measures from a mobile subscriber in an operator network. We know that most of those subscribers are humans using a mobile phone. We know also that some of those “subscribers” are IoT machines. Having examples and labelling of humans and machines, can we say to which category a subscriber is belonging?

In fact, two of my colleagues have taken on the same challenge as I did, with different approach to it. Steven developed a bespoke classifier from a statistical analysis and you can read about it in Data Examination and Home Made Classifiers. Marc-Olivier upped the ante developing a “classical” machine learning (ML) approach to the same problem and details it in Machine vs Human. One interesting aspect of Marc-Olivier’s work is that he wanted to consider the temporal aspect of some features and he spent quite some time doing feature engineering in order to take that temporal aspect in the ML model.

I am still in the process of fine tuning the network architecture and the hyper-parameters, yet with a not so optimized network I get better results than the two other approaches. Those results come at a cost though. First cost is the computing resource required. Steven’s bespoke classifier uses very little computing resources in the training and prediction. Marc-Olivier method gives better results but is a bit more involving, still the training and predictions can be done easily and quite quickly on a CPU. My deep learning approach was executed on GPU (GTX 1080Ti) and yet takes the training takes more time to execute than Marc-Olivier. I’m pretty confident I can execute the prediction on CPU, as only the forward propagation has to be performed, yet I think it will take more resources than Marc-Olivier method. Anyway, we will come up with a detailed summary of the pros and cons of the three approaches in a later blog post.

Now for the RNN approach. Below is a symbolic view of the network I used.

TheLoneNutAI_DNN
Man versus Machine network architecture.

“Wait Pascal! You’ve put the exact same picture as before for TheLoneNut AI!” Yes, at the high level architectural point of view, it is pretty much the same deep neural network, plus on the original picture I omitted certain details… However, the toy example can port to a real-world example quite easily. If we look at the details of the network, then it is a bit different. Below the output of model.summary() from Keras.

_________________________________________________________________
Layer (type) Output Shape Param # 
=================================================================
batch_normalization_5 (Batch (None, 12, 208) 832 
_________________________________________________________________
lstm_9 (LSTM) (None, 12, 512) 1476608 
_________________________________________________________________
lstm_10 (LSTM) (None, 512) 2099200 
_________________________________________________________________
batch_normalization_6 (Batch (None, 512) 2048 
_________________________________________________________________
dense_6 (Dense) (None, 1) 513 
=================================================================
Total params: 3,579,201
Trainable params: 3,577,761
Non-trainable params: 1,440
_________________________________________________________________

The differences are not so big. First there is no need for embedding layers as I have a fixed sized feature vectors coming in (208 features) instead of embedded characters over 24 features. I take those features on 12 days instead of 40 characters at a time. Next the last LSTM layers output a “non-time distributed” output i.e. I’m not interested in the sequence, but rather the “result” of that sequence, the class the example belongs to. Finally, I go through a dense layer which output a single value using a sigmoid activation function. From that output, I simply take a cut-off value of 0.5 i.e. everything below 0.5 is considered 0 and everything above is considered 1.

After training, I evaluate the model on a hold out set of 1947 examples and below is the confusion matrix I obtain. We can see a low proportion of misclassified examples, which is good!

ManMachineConfusionMat
Confusion Matrix of Man versus Machine classification.

In conclusion, the experiment confirms what we already know 😉 . ML gives better results than statistical analysis, and DL gives better results than ML. However, there is a computational cost to it. On the other hand, and we will discuss it in a later post, Statistical analysis is more intellectually involving than ML feature engineering and feature engineering is more involving than well… no feature engineering. Though with DL you have to come up with network engineering and to para-phrase someone else talking about deep reinforcement learning: “On the scale of art versus science, Deep Learning network architecture is still more on the art side of the balance” , but that’s another story!

This is not me blogging!

I think you’ll like this blog post! I came to spend some more time recently on Deep Learning as my current project is taking me in that direction. For fun, I tried a little RNN based deep network to generate text from examples. Obviously, I used as input all my previous blogs (from the two platforms, internal at Ericsson as well as the wordpress ones). I trained it over night, around 140 training pass (or epochs) and asked it to generate a new blog for me 🙂 . So, I’ll give TheLoneNut AI the stage, and come back at the end to comment on the technique used. I’m using The Neural Network Zoo convention established by Fjodor Van Veen at the Asimov Institute to display the Neural Network.

It has been awhile, but to compensate I’ll give you plenty of material over the last two years and we will apply it to the IoT domain. To explain in a few sentences, I would say that we are building some prototypes of Machine to Machine communication using IMS as a smart pipe to provide home automation and in that context, we are building some hardware (using Arduino and Odroid) and even making a custom fit case for that hardware using 3D printing. Below shots are of one of the pieces we built: the IMS M2M Home Gateway (the device which converts the serial information related to the sensors and actuators received or sent to the House Arduino board) to a Raspberry Pi’s. This year it is more difficult. The platform I use does not provide any readable analytics data. Under such circumstances I can only base myself a question: “Why so many peoples decide to concentrate on only one aspect of their potential (e.g.: science versus arts)?” Some will say that to excel one has to fully concentrate on one subject. I don’t adhere that theory. Especially in the sport news domain where tools like Wordsmith created by Automated Insight are used to generate Virtual Peoples, we should get back data equivalent to what we get from the Canadian census data is complemented with the Québec statistical bureau for the age pyramid points about the reality it models, but it must clearly stay a toy. It doesn’t prevent having usage based charging where Ericsson is paid for the used capacity whatever the level it goes.

On the other side of the range, if you take one of the big aircrafts we have in service (comparable size as a submarine); it takes only 2-4 people to man it… simply because a crew of more than 3 days. Getting good in programming a language and in the system rules to make this technology available to the Ericsson Connected Tree can serve as a hyperbole of what an all communicating world is. It can inspire the way he wants, there is no guarantee of safety for anyone, there is little potential and work in a socialty that could help us to be better, faster and stronger? Let me know!

Ok, some hiccups along the way, but for a network which learns from characters only (no words, character level) looking at the most probable characters to come given a series of 40 such previous characters, then picking randomly (based on the probability for subsequent characters), I find it quite amazing at mimicking my style of writing i.e. too long sentences 😉 .

There is nothing really complex in that Deep Neural Network (if even I can call it a DNN). Here is a picture of the structure (for which I take no credit, it was demonstrated by Jeremy Howard in his excellent Practical Deep Learning for Coders course).

TheLoneNutAI_DNN.PNG
TheLoneNut AI network architecture

As you see, it starts with an embedding layer for each of the characters in the sample (106 of them), which happens to be: ‘\t\n !”#$%&\'()*+,-./0123456789:;<=>?ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_abcdefghijklmnopqrstuvwxyz|~Éáäéïó–’, yes some accented characters, I’m French speaking after all! The embeddings are next batch normalized. This is followed by two Long Short Term Memory (LSTM) layers. Finally we have a dense layer for the output to which we apply a 10% dropout and a softmax activation function. So, three (or four if you count the embeddings) layers all in all. Below is the summary of the model:

_________________________________________________________________
Layer (type) Output Shape Param # 
=================================================================
embedding_35 (Embedding) (None, 40, 24) 2544 
_________________________________________________________________
batch_normalization_10 (Batc (None, 40, 24) 96 
_________________________________________________________________
lstm_36 (LSTM) (None, 40, 512) 1099776 
_________________________________________________________________
lstm_37 (LSTM) (None, 40, 512) 2099200 
_________________________________________________________________
time_distributed_22 (TimeDis (None, 40, 106) 54378 
_________________________________________________________________
dropout_24 (Dropout) (None, 40, 106) 0 
_________________________________________________________________
activation_7 (Activation) (None, 40, 106) 0 
=================================================================
Total params: 3,255,994
Trainable params: 3,255,946
Non-trainable params: 48
_________________________________________________________________

I’m using Keras for the purpose of the training and prediction. It is running over Theano on one GPU (nVidia GTX 1080 Ti) and each epochs takes around 410 seconds for my entirety of blogging which account for 300k characters. A bit depressing when I think of it… around 6 years of blogging produced only 300k characters, or 50k characters a year! I have programs far longer than that! Well, now TheLoneNut AI can remedy to this and produce text for me 🙂 !

So, let’s ask the AI for a closing statement…

“Late” text messages are afraid of the monsters under their bed. Eventually it will be done I guess, but it might take time for some great explorer to discover this far away mountain.

Until next time!