Original Source Here
What to Expect from Doing Kaggle Competitions for a Year
I have been doing Kaggle competitions for a year — here are the good things and the bad things
Kaggle is the biggest data science competitions platform. If you are a data science beginner, you have probably heard the advice of doing several Kaggle competitions quite a few times. I have done around 4 2-month competitions over the last year and I have found them super helpful. After adding them to my profile, I started getting freelance jobs and even a full-time data science job, so I will be talking about that in detail here as well.
Choosing Kaggle competitions
One piece of advice I would have is that you want to be strategic in the way you choose which Kaggle competitions you want to do. Personally, I knew that I wanted to get into the medical field and I knew that the medical field is currently sort of relying on images. And so I picked supervised image classification competitions, segmentation, and object-detection competitions. I didn’t win any, but the experience was enough.
Here is what you can expect to learn:
- Exploratory Data Analysis
The first step of a data science project is always to understand the data that you are working with. There are a lot of tutorials for doing this, but believe me, you have to go through the process of doing it for multiple projects to actually understand and gain the skill since this is quite difficult to just teach. Different projects use different data and thus techniques are always going to be different in different contexts. A good data scientist knows which techniques are the best to use in a given scenario to best understand and frame the underlying problem.
2. Picking models and fitting them
This is my favourite part and I think the favourite part for a lot of people. In most projects though, you will be faced with the challenge of picking a state-of-the-art model and going with it. In a lot of competitions and real-world projects, you will not have the time to just go through all of them and implement them so you have to be strategic about your choices. You also have to properly understand why your implementation didn’t work as expected or gave a poor result. And even if it gave a good result you have to understand why. This is a common mistake among data scientists is that they move from model/experiment to another model/experiment without fully analyzing the experiment that they have already done and thus they end up with disappointing results.
3. Open source collaborative work
After multiple competitions, I started noticing the trend. In every competition, the final high scoring solutions are built on top of all of the open-source good solutions. Competitors will constantly release high-scoring kernels up until the final 2 weeks of the competitions. This is always a great opportunity for you to learn from those kernels, understand them and build on top of them.
4. Machine learning project tricks and optimizations
Over the course of several competitions, I have always found winning solutions to have some sort of small optimization that boosts the performance just a little bit in their favour. This is the sort of thing that you won’t find online, on medium or in books. You will have to compete, project and look at other people’s solutions to start understanding the patterns and maybe even developing one yourself.
How I used those projects to boost my profile
This is sort of a side point, but one that I think is worth mentioning. I have found out that just doing the competition and forgetting about it might be a waste of your time if you are trying to get a data science job through competitions. If you don’t properly present your work online, nobody will ever know. After every competition, I made sure that I have reflected properly on the results, understand why I got placed in the top x% and what I could have done better. After that, the next step was for me to write an article on Medium about my experience and write an article explaining the winning solution.
You should also format and document your code properly on a GitHub repository and add links to that repository in your articles. Finally, make sure to add the competition, your Kaggle profile and the results on your LinkedIn profile, this is quite an important step. I have found out that after doing those 3 steps properly over and over, people start noticing and I have received multiple data science consultancy offers and 1 full-time job offer because of the “online presentation” that I have put for the competitions that I have done.
What you won’t learn from Kaggle competitions
Kaggle competitions are a very good starting point, but they aren’t everything. After I started getting consultancy jobs, I have discovered that what you learn in those competitions is only the first step. Don’t get me wrong, they are quite useful, but they just aren’t comprehensive.
After you clean your data, build a high-performance model and everything, the next step would be to push this model to production. This is quite a challenging step that comes with a lot of questions such as:
- Where should I deploy this model? client-side or server-side?
- How do I monitor the performance of this model?
- Do I have enough AI-explainability metrics that I can present to the project’s stakeholders?
- How can I prove that this model will work on real-world data distribution, not just the distribution it was trained on?
- Does this model meet the performance demands of the client? If not, how can I improve it?
Some of those questions are quite difficult to answer and some of them are quite open-ended. But, the good news is that after having real-world experience, you will probably be able to give good answers to them.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot