Your Personal Data Science Learning Plan

Published by Thom Ives on

How to survive and thrive in the ultrathon of progressing as a data scientist

Overview

I helped some nice people recently that told a lot of other nice people about my help, and now I am getting many requests for data science (DS) career advice (i.e. mentorship). Wow! People want career advice from me? I’ve made so many faux pas socially and on the job. I still deal with a lot of personal issues and personal baggage. I am a real dude with real pressures. I still have technical fears around areas that I don’t think I’ve mastered yet. I’ve embarrassingly blown technical interviews that I should NOT have blown! I’ve had students that loved and hated me – some of that hate may have been well deserved! Sure, I think I can say that I’ve grown out of most of that and am more aware of all that, but dang! Me?

I am not among the top data scientists in the world, but I want to be, and I know that I can get the job done – I’ve learned to learn and review fast. I feel very fortunate to have the background and experience that I have, but I don’t really think that I am God’s unique gift to the DS and engineering communities; I am just very passionate about data science and predictive modeling and automation in general.

So, I had to answer for myself, “Why do people want help from me?” I think I only have one guess. They sense, accurately, that I really do care about helping them, AND I feel it’s a great honor to truly help anyone with anything. Dang, I am just honored that they even thought to ask me! That’s all that I can come up with. Yes, I’m relatively advanced in my career and skill set. I know how to explain things pretty good. I have a cool breadth of knowledge and skills, but I am constantly learning and reviewing as well.

Well, given all of that above, here’s my resulting intention. It’s an honor to me to help any of you IF I can truly give you valuable help. It seems people really appreciate my advice and encouragement, so I will start giving career advice to data scientists and other geeks and pray that my advice is VERY helpful and encouraging. More importantly, I will seek to amend / correct ASAP any advice that I realize in the future was … lacking. And, as always, think carefully and test advice that you receive from me or ANYONE!

The Situation

Many talented people at many different points along their journey are reaching out to me for data science career advice. I already feel behind in this new mentoring role. My dear new friend and author on this blog, Manpreet Budhraja, advised, “Thom, The best way to manage the volume of requests that you are receiving for help is to write a series of blog posts.”

I had to agree. But – ugh – the pressure! Due to work projects and being behind on my goals of posting things on this blog, this was one more thing, but it’s one more thing that’s so important that it trumps the pressure. And, fortunately, I’ve found some other authors to help with the backlog of posts that we need to write to help data scientists with the deep technical side of DS. Well, helping was among the original goals of this blog, and career advice is part of that help, so I am in triage mode now. What is the most important foundational piece of advice that will best help all of my new DS friends? I think I have it. Do this well and improve it over time, and you will WITH TIME AND CONSISTENT EFFORT become a great data scientist.

Develop A Personal Learning Plan

I hope the air did not just leave the room. Give me a chance to explain, and I assure you that after you understand why I am saying this, you might end up regarding it as the most foundationally important piece of advice you will ever receive for your DS career … well really for any career actually. Let’s first cover some questions, in no order of relative importance, that I am anticipating will occur in some of your minds:

  1. I’m in college. Isn’t my degrees curriculum my current learning plan?
  2. I don’t have time to do that, just tell me what I need to know (no one would say this out loud, but I can imagine my younger self thinking it).
  3. I am just getting started. How would I know how to come up with this?
  4. I already have an idea of this in my mind. Why do I need it written out (i.e. recorded in some organized format, e.g. spreadsheet)?
  5. Isn’t there one best learning plan for all data scientists (I can really imagine my younger self thinking this one)?
  6. I am open, so I really want to understand why this is important.

How To Get Started

I don’t want to answer those questions directly at this point. I just wanted to pose them for now. For your own sake, be honest with yourself about any doubts that you may have about developing your own learning plan, hold that question in your mind, and see if you cannot answer it from what will be said in the remainder of this post. I’ll seek to defend this whole process concisely and thus answer those questions (and hopefully the ones posed in your own minds and not posed above) throughout this post and then summarize too.

Why The Plan MUST BE Created By You

Data science techniques are growing at a faster rate than the most skilled learner and career data scientist can learn them. What? It’s true. You won’t be able to learn it all (that would have depressed my younger self … now it kind of makes me excited). Then what should you do? Choose wisely! Your learning plan will need to change and grow and adapt to your needs.

If I gave you what I would call the perfect plan, it would need to change every 5 years or LESS. There are plans for getting hired, plans for doing better in your current role, plans for landing that next role that you want, and there are even plans that a Chief Analytics Officer would want to have for after leaving that role. Your plan will need to “alter course” depending on your location in your career. That’s why you must become the master of your own plan and be good at adapting your plan.

And don’t be too serious about making your plan perfect. Remember, the planning will be more important than the actual plan. You will learn more from the actual planning over time than the various revisions of your plan.

“Plans are worthless,
but planning is everything.”

Dwight D. Eisenhower

Just keep the plan growing and alive and adapting over time to best meet your needs.

One last thing. If the last year has shown us anything it’s this – new and better data science methods are arriving onto the scene at an increasingly faster rate. Not only is it unlikely that you will be able to learn all that there is to know about each of the existing methods, it is unlikely that you will be able to learn all of the upcoming methods. What are you to do? Again, choose carefully, and add the new methods that YOU need to learn for your current needs to your current plan.

The Forms of Your Learning Plan

I like spreadsheets, presentation graphics programs, drawing programs (especially scaled vector graphics ones). I LOVE CODING in python and using it to create graphics, interface with spreadsheets, interface with SQL and JSON! But should you use spreadsheets, drawing programs, presentation programs, or python code to create and update and revise your learning plan? Yes!

Please DO NOT think that you have to use one tool to create and grow and refine your learning plan over the years. Use any and all of these that you want to or need to. This is about your plan to learn and grow in data science. Use the tools that you like and that help you clearly and completely communicate your learning plan to your primary audience – YOU!

This learning plan will be your tool to help you understand what you want to learn and the order that you want to learn things and the degree of mastery that you want to have for each skill that you list. However, just to exemplify one usage of one tool, I will start with a spreadsheet, and I will populate it with some examples of what I think are good basic starting areas for any data scientist.

Example of a Learning Plan Section for General Data Skills

Let’s say you are building your plan for learning data management and visualization. I will tell you right away – the example below, figure 1, is not complete. It’s a starting example.

Figure 1: Starting Example for a Portion of your Personal Learning Plan

Use the structure as a starting point but change it to a format that you prefer! Data Cleansing is a general topic area. Think about what general topic areas you would add (or subtract). Pandas is a specific python module under Data Cleansing. What other methods or modules would you add to that general Data Cleansing area? You may not yet know, but when you come across a new method, open up your learning plan and add it. What do those percents mean? They are just my way of saying what level of expertise you feel that you have reached in each area. Again, these are only examples, and this is yours, so use what you want to indicate your level of skill for each area and format that the way that you want to. Format your learning plan in a way that best serves YOU! What’s important is to start building this for yourself. You should be the master of it. This is your skills development plan for your career.

Also, there are many acronyms that describe some of these things such as ETL – extract, transform, load. You can track those and their meaning in a new spreadsheet in this workbook or in a document that you grow over time that contains common acronyms with their meanings, or concepts such as heteroskedasticity, which, isn’t particularly hard to explain, but DANG – what a word! The meaning(s) may grow cold over time in your mind. It would be nice to be able to look it up when you need to in your own notes and be reminded of its meaning in your own words. Also, by having it recorded in your own notes, you can improve the definition to yourself over time and include links and references that you like to help your review that word or acronym or concept. Get creative. Make a tool that is helpful and efficient to you.

Yet Another Learning Plan Section

What about the cool stuff? 😎 Coming right up!

Figure 2: Starting Plan for yet another Portion of your Personal Learning Plan

The nice thing about having your own plan, is that, as you learn more, you can improve it in multiple ways. As you learn how the various models that we use in the predictive analytics side of data science relate to each other, you can group them. As you group them, you can better discern where to dive deeper in your learning. For example, linear regression is foundational to logistic regression, and both of those are foundational to neural networks. This is so cool! It also means that all the things you do to make Linear Regression do better can potentially be applied to Neural Networks.

Although not shown, decision trees are foundational to random forests and those to boost methods. As you learn more about the landscape of DS methods used in the wild, you can improve your charting of that landscape the way that you see it and then study how the various methods relate, or do not relate, to each other. What I am saying is that your learning plan can grow to reflect your growing insights into data science over time. The more experienced DS will notice that I forgot to include elastinet. Nope, that was Ta not me. What? See below.

Your Workshop of Tools

Let’s use a fictitious example to explain some progress that our new virtual data scientist has made. We’ll name the DS Ta (the Mandarin word for she and he both). Ta has learned to train linear regression models with least squares to the level of being able to code this modeling method from scratch in python without numpy. Ta did this to really understand the concepts. Ta also made sure Ta could do linear regression with SciKit Learn and also with just numpy. Ta hasn’t yet learned that you can combine L1 and L2 regularization, and that this combinations is called elastinet. All these cases are stored carefully in a directory. Where?

Ta was wise to sign up for a cloud drive account, so that Ta could save all Ta’s carefully crafted python code for later reference. If Ta’s computer is lost, stolen, or destroyed, Ta’s most important growing assets are safe. Ta also stored important links from Wikipedia and StackOverflow and GeeksForGeeks to each of Ta’s coded scripts that were helpful to developing each code script.

Ta has also started carefully organizing Ta’s bookmarks to important links in a way understandable to Ta. These were links that helped Ta a lot, and Ta wants to be able to quickly find them again when needed.

Ta has also created MANY OTHER DIRECTORIES of example code where Ta practiced specific python skills until Ta understood them well. Ta needed to master these smaller pieces of python code in isolation before bringing them into Ta’s larger application.

But wait! Ta did even more! Ta even pushed this directory of directories of Ta’s carefully crafted code to a Git site to be extra safe and to start practicing good version control. Along the way, Ta learned to use virtual environments, and classes, and pip freeze and pip requirements files, and Ta was careful to keep notes and save links in those notes too on how to do all these new things. Why? Because Ta had been forewarned that Ta would forget things and need to refer back to Ta’s carefully recorded notes. All of this carefully saved and git repoed code is part of Ta’s growing learning plan too.

Does Ta use Linux or Windows or Mac? Is Ta wanting to become an R master also? We will leave a bit of mystery around Ta in those regards. In fact, Ta is thinking it might be best to start with one OS and language, and then expand to other ones, once Ta has mastered Ta’s initial ones. The point is that Ta has adopted some important habits and practices to strengthen Ta’s growth and intellectual sustenance in data science.

A common question has arisen at this point I am sure! What materials do I use? Books, Udemy, Pluralsight, Coursera, YouTube, or a degree program? By now you have likely guessed how I like to answer “or” questions. Yes. Use them all. You know how you best learn. I know great data scientists that prefer books (even paper ones!), and great data scientists that prefer video courses. There is of course, no one book or video course that will make you a great data scientist by itself. Even if you have one favorite, which is awesome, you will want AND NEED to have multiple sources. My advice on which materials to use may not be best for you. Personally, before I buy a book, I look at the Amazon review and comments. Before I pay for a video course, I very carefully look at the ratings and comments, and then look for a promo code. I’ll tell you about some people and resources that I like though.

One big resource is the MS Certificate in Data Science. It costs to get the certificate, but you can go through all the materials for free. I went through this, because, like other DS’s, I’ve found that I need review and new perspectives. Given my background and experience, the analytical sections were easy for me. I liked the conceptual way that they taught SQL. What was awesome was the coverage of things that I don’t tend to think to study like how to conduct good surveys, storytelling principles, and DS ethics. All of these were very helpful.

I am a big fan of Udemy courses in general, and my favorite video course guru on Udemy is The Lazy Programmer. For python concepts, I really like Corey Schafer on YouTube, but there are tons of other free materials for advancing your python skills. I am a fan of the author Giuseppe Bonaccorso, and we are even friends now. We love to discuss DS philosophy and support our DS community together. But please remember that no one author or course of series of courses is enough. You will need to learn to hunt for good new resources on your own. Sometimes, the material may seem dry. Press through it. Learn as much as you can from it. When you come back to that dry resource later, you might find that it was one of your best resources! You just needed to learn a bit more before you were able to glean it’s richness.

Visualizing and Relating Concepts

Perhaps you want to go deeper and more detailed in the way your record your learning. That’s where a good drawing program or a good python tool can come in handy. How? Recording concepts in visual ways is powerful for you and your audience. This audience can be you OR others. Below are some fun examples. Both of the tools I mention are great and free and not hard to learn or use.

Check out this fun post where I introduce visual python and posted a video of the results on LinkedIn. Visual python is powerful! You can animate and illustrate many things very well! Below is a fun video I made while reviewing some basic principles of visual python.

Figure 3: A Fun Visual Python Demo

And then again, while I am seeking to know how to code reinforced learning problems from scratch myself, I decided to make my own grid environment system using visual python. A still 3D picture of that grid environment is shown in Figure 4.

image.png
Figure 4: Beginnings of a Grid World Environment in Visual Python
for Reinforced Learning … a Work in Progress

This is all great for visualizing what you do in unique ways, but what about recording concepts that you or Ta are learning for your plan? Check out the video below of a scaled vector graphics drawing where I record zooming out on the drawing to show how linear regression relates to logistic regression and how they related to neural networks. This SVG was made with InkScape, which is free.

Figure 5: A Video of an SVG Drawing Being Zoomed Out that Explains Relations between Linear Regression, Logistic Regression, and Neural Networks

Ta Years Later

Ta’s personal learning plan, which could also be called some type of skills status matrix has grown tremendously and is filled in with some large percentages. Ta has a personal website that serves as Ta’s resume and has sections to guide others to Ta’s works on his extensive Git repo site (could be GitHub or GitLab – Ta’s a mystery this way too). Ta’s repos have many applications that can help others if the need arises.

Ta has a rich library of books and video courses and blog posts that Ta reviews or references as needed. Ta has even been asked to speak at some data science meetups, and Ta was surprised to learn that the people really appreciated what Ta had to share. Ta also has many more carefully organized links to refer to as needed and a large number of non data science books too, that help Ta in areas related to data science. All these resources are also listed on Ta’s website with small descriptions of why Ta found them helpful. Ta has also developed many illustrations and visuals and has the code that produces them carefully stored on Ta’s cloud drive and Ta’s publically shared git repos. Ta has even entered competitions and helped lead teams to do the same!

Ta’s helped many other new DSs now and has even done some joint projects with new DSs to help them gain experience. Ta works for a great company and has helped the DS team there, and the company’s business, tremendously! Ta had to alter the priorities of Ta’s learning plan at times to meet urgent demands on the job, but all of those lessons helped to round out Ta’s greater skill set, so it was no loss that Ta could not learn things in the exact order that Ta had originally planned. Also, due to advances that happened in DS from the time Ta started until now, Ta had to expand Ta’s learning plan to include learning some of those new advancements that Ta thought Ta would need to use. Ta has discovered a wonderful reality too. Because Ta worked hard many times to learn DS principles very deeply, learning new things now comes much faster!

Summary

No one knows as well as you what you most need to concentrate on in your data science education. One thing is certain. You will need to continue to learn AND to review. You will need to master finding YOUR own best resources. Knowing what you need to learn is what makes developing your own plan hard, but if you dive in and learn yourself what you need to learn, using the mentions above as starting points, you will do much better than if you only go with what others suggest that you learn. It’s OK to start with the recommendations of others, but you will get far more value out of your education if you take charge of your own learning plan and if you are willing to revise it as you grow and learn more. As new methods and tools are developed (maybe even by you!), you will want to add some of those to your learning plan.

Dive in. Push through to mastery on important principles and concepts. Data science can be hard at times. Remember, you are not the only one that feels that challenge of its difficulty. But we go into data science, because it is ultra cool, and we see how it can server mankind very well. Most things in life that are truly worthwhile are hard. Just know that the crew here at Integrated understands this and is trying to help you as much as possible. And more mentoring posts will come out ASAP. This one was deemed the most urgent to help new DS’s.

Until next time …


Thom Ives

Data Scientist, PhD multi-physics engineer, and python loving geek living in the United States.

12 Comments

siddhartha · July 27, 2020 at 4:31 pm

Super-helpful! Thanks for taking out the time to create this carefully crafted blog for the community. Keen on checking out Cory Schaefer videos and the lazy programmer course 🙂
btw, any recommendations for blogs/reads which are geared towards explaining the conceptual understanding?

    admin · July 29, 2020 at 1:47 am

    Thanks for the positive feedback. We will be mentioning other great materials in future posts.

Praise Ekeopara · July 27, 2020 at 4:31 pm

This is really insightful, I was able to grab two important things.
Firstly gathering tools necessary for Data Science and using them to create A Learning plan even when we have not used or mastered them. This plan will constantly remind us that we need to invest in such areas.
Secondly, in the world of Data Science we need to learn to master concepts by ourselves so as to gain mastery and Experience.

Thanks Thom Ives for this insight. I will be waiting patiently for your next post while I obey and put to practice this one.

    admin · July 29, 2020 at 1:48 am

    Great to hear. Eager to write more helpful posts too.

Garima Vatsa · July 27, 2020 at 4:50 pm

This looks exciting. Can’t wait for upcoming mentoring posts, Thom 🙂

    admin · July 29, 2020 at 1:48 am

    I am eager to deliver!

Collins Madisife · July 27, 2020 at 11:04 pm

Hi Thom,

Thank you very much for putting this together. I know you understand the importance of this to growing DS like myself; so thank you again!

Connected with you on LinkedIn already and can’t wait to receive more blog posts from you. Have a beautiful week. Cheers

    admin · July 29, 2020 at 1:49 am

    Glad to hear it Collins. You are most welcome. Happy analyses.

Bruce Hoffman · July 29, 2020 at 3:28 pm

Thom, great post. For anyone that’s considering data science learning here’s your guide!

    admin · July 29, 2020 at 3:54 pm

    Hey Bruce. Thanks for that!

Moyra Piggy Schilit · August 23, 2020 at 1:11 pm

Es hat sehr viel Spaß gemacht, Ihren Artikel zu lesen. Moyra Piggy Schilit

Book Review: “Building Analytics Teams” by John K. Thompson – Integrated Machine Learning and Artificial Intelligence · August 21, 2020 at 5:42 pm

[…] I hope you’ve guessed who it is. It’s Ta! (See the first mentorship post please – “Your Personal Data Science Learning Plan”). As I’ve been honored to be asked to mentor many new data scientists, I created a […]

Comments are closed.