The Machine Learning Research Crisis you should know about

Original Source Here

The Machine Learning Research Crisis you should know about

This one has a full Wikipedia Page Dedicated to it.

As I was reading the paper, “SinGAN-Seg: Synthetic Training Data Generation for Medical Image Segmentation”, something stood out to me. Here we have an excerpt from the paper:

According to the analyzed medical image segmentation studies in [33], 30% have used private datasets. As a result, the studies are not reproducible. Researchers must keep datasets private due to medical data sharing restrictions.

This seemed like a pretty big deal to me. As someone involved in Machine Learning Research, the fact that 30% of the Datasets are private is kind of worrying. When results are not reproducible this causes issues. Not only does this mean that verifying the information in the paper/research becomes impossible, it also restricts new research into that area. Machine Learning progresses incrementally. People often try to improve upon the previous research by analyzing the procedure and trying to tweak steps/add protocols. This could be things such as swapping data augmentation protocols, trying different networks, etc. When the datasets are not available to the public, other people can’t analyze the data and results in-depth, making them unable to contribute. And of course, not sharing your dataset means that researchers won’t be able to catch nuances (such as a biased dataset) that you might miss.

This has become such a huge issue, that Wikipedia has an entire page dedicated to this, aptly named the Replication Crisis. In fact, there is a whole subgroup of (meta)scientists trying to solve this issue. Through this article, I will talk about some ways that we can tackle this issue in the Medical Machine Learning Space and the AI space in general. This will touch upon some of the issues that AI research faces when it comes to replication. If you are someone involved in these areas, I would love to hear your thoughts. Please do let me know, either in the comments or by reaching out to me directly.

What types of issues do we have currently with replication?

Following is a (non-exhaustive) list of the kinds of issues that make replication difficult.

  • Private Datasets (or other things): Regulations, confidential data, or any other reason can make it so that the dataset that is shared used by a group of researchers is unable to share their data with their research. I will also group into these, practices such as not sharing your code or some other detail of the procedure.
  • Costs: In some cases, it is just impractical to replicate the findings of another group, because of the costs aspect. Let’s imagine the GPT3 team came up with some findings using their model. And they were kind enough to put the code and all the data they used online. No way I can replicate their findings, on my puny 4 years old Dell Laptop.
  • Misaligned Incentives: In certain cases, the researchers aren’t really incentivized to share everything. This often happens in the case of private researchers, who share their findings without really giving the details required to replicate the results. To refer to the somewhat notable case of Google Health and Breast Cancer research, they weren’t really interested in publishing quality research. They were more interested in letting the world know that they had cool tech. They didn’t publish many details in their Journal Entry. Suffice to say, Scientists were not happy
  • Lack of interest: Similar to the last point, there just isn’t as much reward in replicating someone else’s research. This means that the researchers often just don’t replicate the findings, even if they have the means to. In some cases, it might be delegated to students. This means that there is a lot of missed opportunity. An expert replicating a study might be able to catch things a student getting into ML would miss.

So why is this an issue?

When studies can’t be replicated, it can cause all kinds of issues. For example, a research team might be using a biased dataset in some way. They publish their results using their dataset. The details are kept private and all is merry until the solution is introduced to the real world. And suddenly, we are exposed to this failure. Think I’m making it up? Think of Apple Face Recognition failing to recognize Asian Faces. Or this example of an AI mistaking a referee’s bald head for a football. Yannich does a fantastic job of breaking down how this could be harmful. Had the research/procedure been made public, somebody could have caught the dataset biases.

Such issues happen more frequently than you’d think. And having open-source reproducible research is a good way to combat it.

What can be done?

Here are some of the ideas I liked when researching this topic. If you have any thoughts on them or have any ideas you like, be sure to share them.

Centralization of regulations and procedures

We need an international standard for data sharing and reproducibility. An international body that can set rules and regulations in order to allow data sharing while protecting the privacy of patients etc. Having an internationally agreed-upon standard will allow researchers to share and use datasets without worrying about violating any privacy concerns. A team in India will be able to use Norwegian datasets to replicate and improve upon findings without both parties having to worry about the red tape on either side.

Having international standards for reproducibility will also allow teams to have a clear understanding of the details they should provide in their work. Some people are pushing for this already. Joelle Pineau of Facebook AI (and McGill University) introduced a fantastic checklist for Machine Learning Reproducibility. Check it out right here. If you’re interested in reading more about this, check out Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program). Another great initiative is the Papers with Code project, set up by AI researcher Robert Stojnic when he was at the University of Cambridge. Such initiatives are boosting the reproducibility of studies. If you read the report, you will see that ever since sharing the checklist, the number of submissions to NeurIPS went from lower than 50% to 75%.

Independent Auditors

An issue that private entities might have is the issue of proprietary tech. For example, they might use external tools. As somebody who developed a tool that helps researchers and engineers in their Machine Learning Work, I wouldn’t want all my code to be public, since that means people will no longer need my tool. Private research entities have a lot of IP to protect. Therefore, they might not be willing to share enough details about their project to allow it to be reproduced.

To address this, I believe that we need an established group of auditors. Private organizations allow them to check their results, replicate their work and go over the details. This private group acts as a representative for the general AI research community. This way, companies can protect their confidential IP while also reducing the impact of keeping their research details private.

Better Incentives of Reproduction

Lastly, I believe that we need more incentives for replicating other people’s research. This is something that is being pushed for. The Reproducibility Challenge is a great example. It incentivizes high-quality replication. By having more such incentives, we will be able to encourage more researchers to replicate the existing research in order to verify findings and ultimately identify future improvements in the process.

We also need to improve incentives for people sharing reproducible research. If research teams have a reason to share their procedures and details (instead of just sharing their details) reproducibility will naturally improve.

An interesting idea I came across was from Dr. Benjamin Haibe-Kains, from the University of Toronto. Haibe-Kains would like to see journals split what they publish into separate branches: reproducible studies on one and tech showcases on the other. This would allow us to distinguish between the two kinds of studies. Instead of Google piggybacking off prestigious journals to show off their work, they will have to either publish more details or share their results in the showcase branch.


Hopefully, this article sparks your interest in the Replication Crisis. As you can see, it’s important for Machine Learning Research, that the research published be transparent. This way people can replicate studies to verify findings, identify areas of improvement, and ultimately help add to the discourse in the field.

This article is by no means the final say in the topic of Replication and Detail Sharing. It was meant as an introduction to the topic. I would suggest looking more into this topic. As more and more research is done by giant tech companies, and private entities, this area will become a very very important topic in the future. Make sure to learn about this, and keep up with the developments. And be sure to share anything interesting with me :). The beauty of the internet is that we can all learn from each other

Reach Out to Me

If the article got you interested in reaching out to me, then this section is for you. You can reach out to me on any of the platforms, or check out any of my other content. If you’d like to discuss tutoring, text me on LinkedIn, IG, or Twitter. I help people with Machine Learning, AI, Math, Computer Science, and Coding Interviews.

If you’d like to support my work, using my free Robinhood referral link. We both get a free stock, and there is no risk to you. So not using it is just losing free money.

Check out my other articles on Medium. :

My YouTube:

Reach out to me on LinkedIn. Let’s connect:

My Instagram:

My Twitter:

My Substack:

Live conversations at twitch here:

Get a free stock on Robinhood:


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: