This week’s mentoring and post is based on the fourth chapter of “The Pragmatic Programmer: From Journeyman to Master”: Pragmatic Paranoia. The focus of this chapter is on making sure your programs do what they are intended to do, and in the event they don’t perform as expected, that you can figure out why in the fastest way possible.
Design by Contract can be supported in different ways depending on the programming language you use. At the very least, Python 3 can enforce it with the use of parameter typing for functions and their return values. Another good way to enforce contracts is to make assertions (assert) early on in your function or simply raise an exception if something is not as per your contract. Note that an assert will anyway raise an exception. Think of it, is it better to fail and stop execution, or populate a table with garbage? When will you figure out it was garbage? Will you be able to correct it at that point in time? Will you still have the underlying data to do the correction?
Learn to know when to use assert and/or when to raise an exception i.e. when you know a boundary, a contract and you want to enforce it, or at least tell an higher up program that something is wrong an you cannot do anything about it. When encountering something exceptional, you can either solve it yourself, or let an higher up program handle it. In the end, if no higher up program can handle the situation, the overall program will be halted.
I know for some the usage of asserts and exceptions can be obscur to understand, so let me try to build a mental model for you. An exception is an exceptional thing that happens to a program. If you are responsible for the invocation of that program and catch that exception, you have to decide if you have enough context to act on that exceptional thing on need to raise that exception higher, to the programming calling you. In the end, if no one can handle the exception, the program will crash and you can debug what this exception is all about. If you know what to do in that exceptional case, then it’s up to you to handle it. Maybe it’s a retry, maybe you need to use defaults values, … just be certain of the handling decision, as this might well pollute your data.
As to how to throw an exception higher, there are two ways: assert or raise. Generally raise will be used when you have an error condition you cannot directly act upon but an higher up program might be able to act upon e.g. a file is not present, or empty, … On the other hand, assert will be used when you can swear this should not happen. In both case, exceptions are thrown higher up.
When you figure out that something “bad” happen to your program, you know this should not happen and you know there is no way around, the way to throw and exception higher can be via an assert. This will throw an exception to the higher up program which will have to decide what to do with it. An example would be you expect an int as input and you get a string, this is contract breaking, the higher up program ask you to handle improper data, why would your program decide why the contract was broken? It should be the responsibility of the caller to handle that exception properly. That might warrant a assert right there.
Depending on the organization you work in, when an how to use exceptions and asserts might get philosophical. On the other hand it could also be subject to very specific rules. There might be really valid reason why an organization might prefer an approach over another. Learn the rules, and if there is no rule, have discussion around it and apply your best judgement. In any case, dead programs tells no lies. Better kill it than having to deal with polluted data a year in the future.
Before running one should learn to walk. Often people come to data science without much knowledge of programming and suddenly are asked to take care of existing ETL in Python, or to design new ones. The pragmatic approach is to learn programming! As I said earlier, Python is a multi-paradigm language. The procedural programming is probably the most well known approach especially if you used Python in a notebook environment. Linear programming and the use of functions…
If you use pytest or some other libraries, you probably started wondering a little bit about Object Oriented Programming. Maybe you just copied the recipe and haven’t looked too deep into that paradigm. Here’s your chance. I found this well written primer on Object Oriented in Python which also links to other resources. If you are to write solid ETLs, you’ll want to have some knowledge of OOP.
If you manipulate data, later or sooner you will want to use lambda functions, map, filter, reduce, … or if you use numpy / pandas, you’ll get interested in apply, etc. Again, you could just follow the recipe, but again, if you want to get stronger on Functional Programming and again I found an interesting primer to Functional Programming in Python. It’s far from complete, but it links to other resources to fill the gaps and once started, you’ll simply want to learn more!
I said it multiple times: practice makes perfect! In the first two weeks, I proposed to you exercises from CheckiO. Feel free to register an account there and follow the path from island to island to gain knowledge in Python. There are other resources also to practice programming. Each year since 2015, in the days before Christmas, for the month of December, Advent of Code propose small problems the Elves and Santa face to deliver gifts! You could also see if there are coding dojos in your area, or virtually, I found them a good practice venue as well.
This concludes my complement to the fourth chapter reading of the “The Pragmatic Programmer: From Journeyman to Master” as a medium to improve software writing skills in data science. Next week we will take a look at Chapter 5: Bend or Break.