After being so vocal about how to be a bad data scientist, I thought I should even out the play field by giving some hints on how to become a good data scientist. The other side of the medal.
My strong feeling is that is you just start in the field for employment or salary reasons, you start on the wrong foot. You should first look at your passions. Here it is interesting to take a few seconds to lookup the word passion as defined on Dictionary.com:
- any powerful or compelling emotion or feeling, as love or hate.
- strong amorous feeling or desire; love; ardor.
- strong sexual desire; lust.
- an instance or experience of strong love or sexual desire.
- a person toward whom one feels strong love or sexual desire.
- strong or extravagant fondness, enthusiasm, or desire for anything: a passion for music.
- the object of such a fondness or desire:Accuracy became a passion with him.
Hopefully the scope of your passion for data science does not involve definitions 2, 3, 4 or 5. But is driven by a strong fondness and enthusiasm for data science! If so you are on the right track and my first advise would be: do not try to swallow the ocean in one sip. Zoom on one aspect of that passion, the one that piqued you interest first. See how you could apply it in a real-world problem and learn along the way. For example, in my case, I got passionate about artificial life long time ago. That evolved in becoming fond in a form of reinforcement learning, the genetic algorithms and genetic programming around 2012. As time passed, I grew my interests in machine learning and deep learning, learned about it by reading books, taking online courses and taking a graduate course while studying for my master’s degree. At that time, I had the hope to apply it to the project I had for my master thesis, but sometime plan changes. So, in short you need to follow your heart here.
If you go with such an approach, you will avoid many of the pitfall I mentioned in the first post. You won’t come to expect a “clean” data set as your input since you’ll have applied it to a few real case examples as you learned. You will learn along the way how to gather data, how to clean it, how to interpret it… it will benefit you in two ways. First you will learn one of the essential skills, data cleaning. But most importantly, it will grow your inquisitive mind. Something that I never seen a single course being able to do. Again, I do not think this is a skill you can get in a few weeks, it requires a mind shift that you will acquire through repeated practice.
Another benefit of going along your passion is that if you don’t already have the necessary mathematical background, you will grab it along the way. If you find maths hard, it is probably easier to grab them on a need basis as you expand your knowledge through your own passionate experiments! I will also re-iterate that nonetheless what you might think or have been told, mathematics is not so hard. Moreover, they are way easier to get if you start with a positive attitude, telling yourself that you can do it.
Next benefit of such an approach is that you will have to define and refine your problem. You will decide what is important to you, what is your “research” question and how it relates to the activities you are doing along the way. When I was doing my master’s degree, I saw two types of students. Those who already had a research agenda, a question they wanted to explore, or who at least sat down early with their advisor and set up such a research question inline with their interests and passions. Those students usually made high quality presentations, were following courses highly relevant to answer their research questions and became highly proficient in their field of research. The second type of student waited for their advisors to give them a research project, never were really involved in it, presented average or poor presentations, followed any courses without really seeing how they related to their research topic: well, in most cases they were not… and at the end were probably still graduating, but with a subject to forget about… You want to be like the first type of students, even if you do it on your own, you want to take control of it and reap the benefits.
Lastly, it is good for you to write or talk about your findings and learnings. Myself I found it help crystalize my thoughts and get (sometime) some feedback from other comparable minded peers. All to say that academic papers are not the only way to communicate your findings, blogs, videos, reports can all help you if you have the passion. Sure of advantage of an academic paper is the peer review system which provide you with feedback on your research, but you should not limit yourself to that single media of communication if it is not suited to your reality. Expose plainly what you found, do not claim you are something you are not, or not yet. When the time comes, other will recognize you as a data scientist and that day you will know you are one for sure!
In the same lines as my previous post, learn hard: it is easier when you are you are following a personal research/interest goal. Work hard: again, something easier (not necessarily easy) when you follow a passion. And at all time be honest with yourself (but also others) about what you know or found out. If you think of yourself as a full-grown data scientist on day one, you might not put in the work necessary to ever become one. On the other hand, if you follow your interests and passions, you might become a data scientist before you even think of yourself as one.