Recently my team participated in a company internal Data Science Challenge. The challenge was proposing a data set from a managed services ticket management system. For the specific customer supported by this ticket management system, Service Level Agreement (SLA) specifies priority 1 tickets should be answered within a specific amount of time. From the data that was provided to us we needed to build a model that would predict if a specific ticket would breach the SLA agreement with the information available after half the time to an SLA breach, into solving a specific ticket.
Our team enrolled in the challenge and we started early, extracting features from the data set and trying out simple algorithms first. Some of us had a prior academic background with machine learning, but this was the first time we trialed our knowledge with industrial data. Two choices were offered for programming: R and Python. We went with Python as I was the only one with R knowledge, and quite little of it anyway. We also settled on scikit learn and obviously numpy in order to ease our task of trying our different algorithms for the prediction.
The challenge was asking for at least 80% specificity and then the highest possible sensitivity. For those unfamiliar with those terms, you can think about specificity as the ability to not crying for wolf when there is none and the sensitivity as getting as much of SLA breach identified. For the purist, you will notice this is not totally true, but this gives the idea that we don’t want to much false alarms and when this is settled, we want to identify as much as possible SLA breach.
We started slowly with classification approach and tried Nearest Neighbor algorithm which was reaching easily the 80% specificity but leaving us short on sensitivity (around 55%). We had a first breakthrough with Ada Boost and Random Forest which brought us toward the 75% range. Then we tried a regression algorithm and set the threshold to a lower value thus reducing the specificity, crying for more wolves then there actually are but increasing our sensitivity in the 85% range.
The results were in today and our model did not perform as well on the Test data set as it did on the Training / Validation data set. We ended up with just a bit above 80% specificity (80% was required at a minimum) and 64% sensitivity (the winners got 75%).
I still don’t know why our results on the Test set was so much lower than on the Train / Validation set, but I will certainly try to investigate more if possible. Well, it was a nice interlude to our regular work at any rate!