Original Source Here
Let’s end it there. What’s next? Every hurdle is born from a question, and the question for this would be about what to do with the language itself. Tried to do some surfing on the omniscience sea of information, and it turns out that Python is best for scientific and mathematical use. Huh? Should we use the formulas that Newton and Avogadro left to us?
Well, that’s a thing actually, but other thing worth noted is that Python is really rich in processing data through many libraries and tools developed by the communities. When you’re starting your journey in the industry, most likely you will read or hear the sentence that will tell you,
“Data is the new oil.”
Well, data is really complex, right? We know that nowadays it is not just a sequence of numbers, but also pictures, texts, even your fingerprints. Some are structured, some other will be able to make you frown. It is a really authoritative thing and in that sense, someone will try to build a conspiracy that we can conquer the world if we conquer it.
Wow, that sounds overwhelming, doesn’t it? Because of how massive the field is, currently it is specialized into many small partitions. Some of them are data engineer who constructs and prepares the architecture of how the data is stored in, data analyst which analyze the data from statistical perspectives, and data scientist that analyze information from the data to extract insights and interpret them.
Most of the tasks in those three categories can be done through Python with the help of Pandas library, but some will require more tools and languages like SQL or Excel. Also, it is worth noted that in big companies, usually data scientists should be proficient in dashboarding tools, such as Tableau and Redash, which is usually used to compile all analysis and visualizations in one place, called dashboard.
Undoubtedly, Pandas would be the get-go to learn and start with. Just to be honest, it is not a really complicated library and it is absolutely versatile. Again, let’s make it simple: it is a way for Python to mimic SQL as much as possible through a data structure called DataFrame which also consists of another data structure called Series. As we are having data stored in databases or files such as CSV files, and basically DataFrame is what we call Table in SQL. Psst, always direct yourself to the official documentation when learning libraries, and here is Pandas’ amazing documentation for you!
Oh, don’t be surprised if by the time you’re entering data science, most of the practice problem will use many new business terms that you might be unfamiliar with. In nature, data science is what you would call an interdisciplinary field, meaning that it connects a lot branches of scientific knowledge, and in this case: technology, business, and statistics.
To be a professional data scientist, at the very least, you should have an understanding about business knowledge and strategies. In one sense, this of course will be a hurdle, as in IT there will be lot of things to learn, and we just pushed a pile more to the stack. My take on this is that, new knowledge means new perspectives, and the broader the horizon that you can see, the better. Never settle for less, and I think that this can be a medium for yourself to grow another global mindset as well.
I also had this difficulty just recently, when I applied to a company as a Data Scientist Intern. Before, I expected a more technical role in the corporate through the intense utilities of many tools and technologies. What shocked me was that, when I did the trial before the actual internship, the company merges the field to Business Intelligence and this resulted in a more business-like mindset. Nevertheless, nothing will break apart any of your existing mastery when you take in a new ideas, and it directly helps you convert insights in your data to powerful strategies to help the company you’re in.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot