Running Machine Learning Projects: Things to know

Original Source Here

Running Machine Learning Projects: Things to know

The lifecycle of Machine Learning and Data Science projects

In my experience, ML projects come in all shapes and sizes and vary greatly in their complexity. In the initial 2000/10s, the emphasis was on the model-centric approach which I always found a little bizarre as I heard umpteen times about the fancier models than the results(well, it was a different time and folks used to get awestruck when they would hear ML, AI, DS etc); slowly and steadily the focus has shifted to the result centric approach i.e. use whatever model you can but make use of the data and produce results that are directional, applicable, and coherent.

Despite the shift in focus and increasing literacy in the field, more often than not, efforts in the ML side end up biting the dust. There is no blueprint of success but there are certain steps that one can follow to ensure that one is in the right direction.

The write-up below is nothing but a brief crux of experiences I had while leading ML teams and working on some interesting problems at Accenture, Evalueserve, BCG, Intent HQ, and Quantplex in the past 10 years.

Image Source: Unsplash


  1. Before you take that leap, you should be clear what is the problem statement you are trying to solve, what are the boundaries of the solution i.e. what can be solved and what cannot, what data is available, what teams you will be dealing with, what timelines you are working under.
  2. A good idea is to sit for 2 hours in a conference room and on a whiteboard list everything out, try to explain the problem statement to your teammates(also, the SOW/contract with the client should have all the information otherwise it can cause a great deal of trouble, at least the problem statement, amount of time required to solve, and financials should be there in the contract). Ask your team to list out all the data and the factors that will be required to solve the problem 100%(in an ideal world) and then find what data and information do you have to solve the problem. Is it enough to solve the problem? Can you conduct secondary research to supplement the data sources, can you ask clients for relevant information?
  3. Can you modularise the problem statement and prioritise the modules and work on the most important modules first? Can you use the Pareto principle i.e. what module be worked on that will solve ~80% of the problem?
  4. Depending upon the team size and the timelines, devote 2–4 days to research possible solutions of important modules. Read research papers or find solutions that worked for others. Maintain a few communication channels on whatever platform you’re using Slack, teams etc and please keep the mindless chatter to a min(maintain a separate channel for such shenanigans).
  5. Everyone in your team should be clear about what is being solved and what are the possible approaches to solve. Select the one that is easy and will have maximum impact.


  1. You know the goal of the project, can you quantify it? If you are working with multiple goals then can you stack them according to importance?
  2. What is the acceptance criteria or success metric? (You’ll have to work with the clients or the stakeholders for this, it is a consultative exercise).
  3. Can you define the metrics that you’ll use to measure the goal/success of the project? Just like the positive metric, you should coin a negative metric and your results should satisfy both the positive and negative metric e.g. while designing a recommendation system positive metric could be % of products bought that you recommended and negative could be % of products that were bought that you didn’t recommend to a user. The percentage for the negative metric here should be small.
  4. The metric you defined in the step above should relate to the real world and not only satiate the lecherous eyes of the data team. Yes, the model should have statistically significant results and you should use R², likelihood, log-likelihood, sensitivity, specificity, confusion matrix variables, accuracy, MAPE, MSE etc but only for assessing the performance of the model and not for communicating results to an audience that has business stakeholders and non-tech folks in it. Make sure that you use something that means something in their world. Create proxy metrics that have more real-world connotations if you can’t find anything else.
  5. What’ll you do to not let your model go astray and inherit the biases of the world such as race, cultures, subcultures, gender, age, geography etc)? Can you bring balance to your microworld?


More often than not put on the back burner and ignored, infra becomes a bottleneck. If your machines aren’t capable to run a heavy solution then look for alternative or cloud-based solutions.

  1. Does your team or you can access the data that is stored in different formats than your models might require? Many times the data feeds are pushed to S3 buckets in the HDFS ecosystem and one has to run apache spark queries to access pertinent slices. Don’t overload your team with data scientists, make sure you have data engineers too, they are god sent in crunch situations and make your team versatile.
  2. Your product isn’t going to be a jupyter kernel in real life, it will be a set of files that will be run through a scheduler via a cron job which will publish the results on a dashboard or push the results in another S3 bucket. If you think all of this was gibberish, then make sure you have folks(engineers) who can help you out. ML projects are more often than not an individual contribution, a wide gamut of skills are required to make it a success.
  3. The backend infra design will limit what models you can develop and what outputs will be supported, how often you’ll be able to refresh the models and update results. The backend infra should be figured out first before the team starts developing a heavy and complex model whose results can’t be used.


  1. You already have the data with you but you need to conduct data sanity checks. It will check the integrity of the data provided to you.
  2. How fresh is the data, does it even make sense to work with it? Will it capture the emotions that you wish to capture with your models?
  3. What’s the refresh rate? How often it goes stale? Is it GDPR compliant? Does it have inherent biases?
  4. Have a glossary and definition of all the features that you have in your data. What’s the source of the data? Is there a dependency on the upstream data and if yes then what’s the refresh rate and what are the SLAs and who defined them?
  5. One of the most important parts is do you have the info in the format you want? The most common one is int typecasted as object and DateTime objects as strings. You’ll need to preprocess such features to have less pain in the later stages.

Baseline Wolves

I am assuming that you have conducted a satisfactory exploratory data analysis, don’t spend weeks on it trying to run various statistical tests on every possible feature; use your common sense on what could be important. Outlier analysis is an important task. Outlier analysis helps later on when you have incremental models and are assessing the performance of the models and trying to justify certain misclassifications.

It’s important to define a baseline that you will strive to beat in the coming days.

  1. Take a week and come up with a baseline model, it will be better to have something that isn’t too complex and just uses a semantic-based approach to the solution rather than involving any machine learning. The baseline model will also help you in testing the sanity of the models and also test some simple cases e.g. if you are designing a loan approval system, everyone with a credit score less than 400 should be flagged. Such simple initial designs help you test the rigour from the beginning.
  2. Make sure that the baseline model can produce the same metric that you defined in the steps above. Record how far you are from the goal and what you might need to do to inch towards the success metrics or the goals you defined.
  3. Deploy/ship your first models as soon as possible to have something in the pipeline and to test the end to end system. It is good to have Streamlit/Heroku skills, so you can showcase intermittent or even the final results to a larger audience.


Once you have a baseline set, you need to bring improvements in the system through various feature engineering, data modelling endeavours.

  1. Your subsequent iterations should be incremental, use your experience, client’s and business folks’ inputs, and info you gathered from the secondary research to gauge what signals and features should be used in the incremental models. Build your models and try to beat the baseline.
  2. Make sure that your sampling methods are not biased. They shouldn’t bring biases in your incremental model’s results and make the picture dirty.
  3. Always go back to basics and see if you can strike a balance between interpretability and complexity of the models. If your models are getting complicated then you’ll lose on interpretations. PCA is a good dimensionality reduction method and a decent way to visualise the data but try explaining it to a mixed crowd of 100 folks and you’ll get frosty nose stares.
  4. Let’s say you created some complex features or picked one of the hidden layers of a neural network and use it in your model, fundamentally it is fine to do but such features are generally non-convex in nature which means that the internal optimization engines of your model will have a hard time to converge and thus it is highly likely that you will get stuck in local maxima or minima and thus the results might differ in each run. If you are using such features then test the robustness of the system post their introduction.
  5. Keep checking how your incremental models are doing with help of both positive and negative metrics you coined.

Mind and Machine — The final mile

ML models aren’t going to work in isolation(at least till they pass the Turing test and we get the AGI), so the human aspect is quintessential for the time being.

  1. Craft a decent strategy that will help you when the data fails e.g. how would you classify those users who don’t have a credit score in a loan approval system or what recommendations will you show in case there is a backend system failure.
  2. Your model will have objective functions and loss functions. If you think there is a creepy feature that is a troublemaker then push it in the loss function. That’s where it belongs.
  3. Can you conduct user trials for your product/system? If yes, then accept the results because you and your team are well into the world of ML and understand the nitty-gritty of the system but an end-user might think differently. You know that tomato is a fruit but end-users consider it a vegetable, so be it. Don’t be hell-bent on changing the opinions of the world.
  4. Keep measuring how your system is performing according to the metrics you concocted. Is your system adrift too far from them? Is there a need to redefine the metrics?
  5. An important factor that is generally swept under the carpet is model refresh and model ageing. Will it be a herculean task to refresh the model, will it age like a pinot grape-based wine or will it be an uncontrollable freak like a flailing scarecrow falling down the stairs?

Larger Scheme

There are questions that you need to keep asking throughout the lifecycle of the machine learning project.

  1. Is the system or the product getting better with time or is it stuck?
  2. Can you do something to simplify the system and reduce the complexity?
  3. Do you feel confident that you are answering the questions which you started out to answer?

End-to-end consulting

  1. It is always a good idea to take a slice of the results and conduct an A/B test; focus on a cohort of users, geography, product that you(or business)believe to be pertinent. Don’t run the test for too long as A/B tests are expensive to run; that’s why the cohort should be carefully selected where you expect an impact from your project/product. If the results don’t start tilting in your favour, it could mean a lot of things — don’t jump to conclusions all of a sudden. Looks for loose ends. It is entirely possible that the model sucked but it’s also possible that it never reached the end-user in which it was supposed to reach.

Take the end-user results and see whether you can incorporate the feedback and improve the model.


I have deliberately not covered a few parts such as choosing the toolset, IDE, etc and there are a few parts that I might not be aware of. So, if you happen to stumble upon this write-up, please provide your thoughts and what do you do to structure your ML teams and projects.

Enjoy 🙂


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: