3 Unique Tools Data Scientists Should Know



Original Source Here

Table of Contents

  1. Introduction
  2. Why?
  3. Postman
  4. Rancher
  5. Jenkins
  6. Summary
  7. References

Introduction

The position of a data scientist can vary considerably, whether it be from only focusing on algorithms in a Jupyter Notebook, or being a full-stack software engineer utilizing data science models. In school or in your early career, it can be expected that you focus more on the different types of algorithms in general and their applicable use cases. As you become more senior in your position, you will become more familiar with machine learning operations and DevOps tools, which we will discuss in this article.

Who and Why?

There are certain companies that are bigger where it can actually be preferred if you solely work in the algorithm space, and you will not ever touch certain tools. While there are other companies, tending to be smaller in size, which want you to work as a full-stack data scientist, not in the sense of a full-stack software engineer, but in regards to being able to dissect a problem, find the data, aggregate the data, automate data collection, feature engineer, model build, deploy, as well as monitoring and alerting. All in all, it can be fulfilling to know and master the full pipeline of data science, no matter the case. With that being said, these tools are important not only for data scientists on all levels, and for being ahead of the game interview-wise, but also for data scientists who want to collaborate and be fully autonomous in their work.

Postman

Photo by Joanna Kosinska on Unsplash [2].

The first tool that we will discuss is called Postman [3], and I call this tool more unique, as I have seen a lot less discussion and expectation around it and the next two as well, when compared to others that are much more generally known.

This tool is described as an API platform. It can be used for a variety of use cases, but since we want to focus on data science, we can highlight the reasons why you would want to use it.

Here is why you should use Postman as a data scientist:

  • Sending API requests
  • Testing your Python tasks
  • Ensuring that your new model code will work in production without messing up production
  • Checking that the Python tasks return the expected results
  • The last check for GitHub pull requests (PR)

An example of Postman request testing for data scientists would be testing your production pipeline code changes from a PR. For instance, if you had some type of preprocess.py file that you updated, and you want to ensure it will work correctly in the production environment, you can send the API request and check to see if the data you have preprocessed with the new code works, utilizing all of the other code that your file includes like the specific Python library imports.

Rancher

Photo by Jakob Cotton on Unsplash [4].

Building off of Postman, you can benefit next from the Rancher [5] tool. It is described as a software stack for collaboration with adopting containers, and delivering Kubernetes as a service.

Here is why you should use Rancher as a data scientist:

  • Now that you have utilized Postman, you can see your request logs in Rancher
  • Hopefully, your new code changes do not return errors, but if they do, you would also be able to see them here in the Pod logs
  • If you have any errors, you will be able to solve them by their error message without affecting the current production process

This is a great tool for data scientists looking for a more automated and streamlined way of checking their work.

Jenkins

Photo by Patrick Tomasso on Unsplash [6].

The third tool would also be used last in this more DevOps and machine learning operations-oriented process. This tool, Jenkins [7], is described as an open-source server that is automated, which allows data scientists, in this use case, to build, test and deploy their model changes or just model in general.

Once you have tested your changes with Postman, and Rancher, your next step can be to use Jenkins.

Here is why you should use Jenkins as a data scientist:

  • Ensure that you can deploy your docker PROD images
  • Build & test changes
  • This can all happen automatically too after you have merged your code changes into GitHub

Summary

Altogether, these three unique tools for data scientists can be extremely helpful in ensuring and validating your code works how it is expected to. It is also beneficial to know this so that you can be more automramous. Lastly, it can be something that will make you a more competitive applicant.

To summarize, here are three unique and benefical tools for data scientists that you shoud know:

* Postman* Rancher* Jenkins

I hope you found my article both interesting and useful. Please feel free to comment down below if you agree or disagree with these particular tools. Why or why not? What other tools do you think are important to point out in regards to data science DevOps and machine learning operations (or MLOps)? These can certainly be clarified even further, but I hope I was able to shed some light on the more unique tools, skills, and platforms for data scientists.

I am not affiliated with any of these companies.

Please feel free to check out my profile, Matt Przybyla, and other articles, as well as subscribe to receive email notifications for my blogs by following the link below, or by clicking on the subscribe icon on the top of the screen by the follow icon, and reach out to me on LinkedIn if you have any questions or comments.

Subscribe link: https://datascience2.medium.com/subscribe

Referral link: https://datascience2.medium.com/membership

(I will receive a commission if you sign up for a membership on Medium)

References

[1] Photo by Tony Hand on Unsplash, (2019)

[2] Photo by Joanna Kosinska on Unsplash, (2018)

[3] 2022 Postman, Inc., Postman homepage, (2022)

[4] Photo by Jakob Cotton on Unsplash, (2019)

[5] Rancher, Rancher homepage, (2022)

[6] Photo by Patrick Tomasso on Unsplash, (2018)

[7] Jenkins, Jenkins homepage, (2022)

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: