My Biggest Mistakes as a Data Scientist*uQK5gyWm-SeLebmB

Original Source Here

Assuming that machine learning is everything

I think for many of us our first exploration into data science is through machine learning. If you’re anything like me, ML is probably part of what got you sold on a career in data science. I think MOOCs are partially to blame for this, as there tends to be a focus on understanding ML algorithms from scratch and building ML pipelines with toy data. To be honest I completely understand why, ML is one of the most powerful tools in a data scientists toolbox, it’s also very marketable thanks to all the sci-fi movies. The question I get from most junior data scientists I interview is, “When will I be doing machine learning?”. You might ask, what’s wrong with this level of enthusiasm?

Now here’s a hard pill to swallow: Outside of FAANG companies, most companies are not at all ready for full scale machine learning, and even amongst these there are varying levels of ML maturity. So if you land yourself at one of the less mature companies what should you do?

First things first, shift your mindset away from building machine learning models for a bit, start with trying to understand the business you’re working with, their pain points, customer initiatives, and objectives. It won’t be easy, but you’ll soon start to see other areas outside of ML that a data science skillset could be extremely valuable.

Some of the impactful projects I have done as a data scientist have been modelling and stats related. These usually support one-off pieces of analysis that help the business make key strategic decisions. Some examples of non-ml projects I have done include pricing optimisation, network simulations using queue modelling, loan settlement analysis with logistic regression, price-demand elasticity modelling, and portfolio insight dashboards.

There are plenty of things outside of ML that you can do as a good data scientist.

Photo by Varvara Grabova on Unsplash

Failing to drive the direction of your work

I struggled with this quite a bit when I first started. In any new role or organisation, you can end up feeling quite disorientated. You don’t really understand the politics, you’re meeting lots of new people, everybody seems to have their own agenda, the list of woes goes on really.

I think back to a particular situation where a colleague of mine keen to make his mark reached out to a portfolio manager on my behalf. I don’t know what conversations were had, but I was met with a lengthy email in my inbox listing some financial reports that the manager wanted. It was a face-palm moment for me, the tone of the email indicated that approval to do the work had already been given without my knowledge. Of course, reporting is something we can do, but I saw straight away that building these reports in such a manual way wasn’t the best use of our time, on top of this there is an analytics team that serves BAU reporting. I tentatively agreed to deliver the reporting, partly due to wanting to make a good impression early on. Scoping out this reporting task took up a ridiculous amount of my time and took me away from the work I had set out to do. I should have just declined and referred the stakeholder to our reporting team.

The lesson here is to remember you were hired for a reason, you’re the expert and you need to drive how you engage with your stakeholders. You have a responsibility to identify which projects are going to be the most impactful and therefore the best use of your time. You do not have to say yes to every request that comes into your inbox. Remember everybody has their own agenda, people are happy to ask you to do a bit of work for them just because it’s tedious and you’ll soon find yourself taking ownership of that work. Don’t say yes to everything.

Attempting to boil the ocean

When I was given the position of data science lead, I felt a lot of responsibility on my shoulders. I believed I needed to prove myself worthy of the position by delivering the most fantastical thing that would save the bank millions. My heart was definitely in the right place, but I was extremely naïve. One of the problem statements I was given was on customer retention, I immediately began down a rabbit hole of uplift modelling, propensity modelling and reinforcement learning seeking to build an all-in-one, self-learning ML solution to customer churn. My experienced readers will probably see the problem with such an approach. It’s going to be complex and expensive, and you probably don’t have enough data. That was all true, but it was only after wasting days researching and writing up an approach that I came to this conclusion.

A long time ago now, a manager of mine used to warn us about trying to boil the ocean. He would say “John, we are not trying to boil the ocean” in a thick, Spanish accent. Whenever he said that, I would take a step back, reassess the work I was doing, and ask myself “does the return justify the amount of effort I’m putting in?” Sometimes our minds are more complex than the problem. In the customer retention example, all that was needed was a one-off piece of analysis so that the business could better understand customer churn.

Mistaking complexity for effectiveness is a sure-fire way for you to become that expensive and ineffective data science team.

Not grasping the landscape

I’m willing to bet that some of my readers have already made this mistake. Unfortunately, most of us learn this by direct experience. When I say “grasp the landscape” I’m not talking about understanding the business or it’s products, that’s pretty much standard for any job role you pursue. I’m talking about understanding where the company is in its maturity for data sciences and machine learning.

The rigorous application process I had been put through made me assume the company was a lot further ahead than it was. Only after I joined did I find out that the company was much further behind than I had anticipated. Being at an early phase company really presents some awkward challenges that can really throw you off if you’re not prepared. On top of the obvious technical things like getting the right tools in place, you might have to educate your stakeholders on what data science, or machine learning is.

Photo by Daniela Cuevas on Unsplash

The lesson here is that you should ask the right question at the interview phase when joining a new company. I have written a blog post that gives you some questions you can ask to understand the landscape a bit better.

Getting stuck in the weeds

I’m the type of person that likes to get my hands dirty. I want to have my fingerprints on every project I’m involved in whether that’s building a prototype, data visualisation or anything else of a technical nature. A part of this comes from fear and anxiety. It has taken me years of hard work to acquire my skillset, and I fear I’ll lose what I don’t use. On top of this, I really don’t want to become just a people manager.

I lead a data science team, which means I have responsibilities outside of just technical delivery. I need to be able to see the bigger picture, manage stakeholders, set strategy and direction, get budget, and communicate. All these things are critical to the success of any work we do. If I am too stuck in the detail, these other important things do suffer as I learnt.

The good news is that you can still be involved technically, but you must set appropriate boundaries for yourself. These days I might build a PoC to guide the technical direction of a delivery, but I’ll leave my junior data scientist to develop it further.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: