Data Science: Write Robust Python With Static Typing

https://miro.medium.com/max/1200/0*rbiDe7yU1Za0dq1X

Original Source Here

Data Science: Write Robust Python With Static Typing

Updated for Python 3.10

Photo by Umberto on Unsplash

There are two types of programming languages: Statically typed and dynamically typed languages.

Python is a dynamically typed language. You don’t have to explicitly specify the data type of variables. The same is true for functions: You don’t have to specify the type of arguments or the function’s return type.

In other words, there is no code compilation — the Python interpreter performs type checking at runtime — during code execution.

On the other hand, in statically typed languages (such as Java), it is mandatory to declare the data type of every variable. Thus, type checking is completed at compile time. That’s why statically typed languages are typically faster, but more verbose.

With PEP 3107 and PEP 484, we can use type annotations in Python to annotate data types. However, Python will remain a dynamic language. The role of types hints is to help you write clean and robust code.

There are two main types of annotations:

  • Variable Annotations
  • Function Annotations

But first, let’s demonstrate a case that shows why type annotations are beneficial.

Mini Example

Let’s say we have a simple pandas dataframe with movie reviews. We want to perform some basic preprocessing and then store the result in a new variable:

The reviews_cleaned variable is now a series instead of a dataframe. It’s easy to miss that because the reviews dataframe has a single column.

Let’s see if we can use type annotations to make our Python code more readable.

Variable annotations

We use them when we declare a variable. After the variable, add a semicolon followed by a space (PEP standard) and the type of a variable: For example:

We can also annotate non-primitive variables, such as Lists, Tuples, and Dictionaries:

Of course, we can annotate data-science-related modules such as Pandas and NumPy:

Note: If you import pandas and numpy with aliases other than pd and np, change the annotations accordingly.

Function annotations

We can also annotate function/method arguments and their return types. Let’s see an example:

The function sum_2_numbers calculates the summation of two numbers. However, because the + operator is overloaded, it behaves differently: For integers/floats, the function calculates their summation. For strings, the function sum_2_numbers outputs their concatenation.

We can make the function sum_2_numbers more clear by adding type annotations for arguments and the return value. We use the following format (bold shows the annotations we add):

function_name (argument_name: argument_type) -> return_type:

Using annotations, the function sum_2_numbers then becomes:

The function takes as input two integers and outputs an integer.

Note: The user can still pass float and string datatypes — the annotations are just hints and not a requirement.

However, there is a tool we can use to force type checks.

Enter Mypy: Force Type Checking at Runtime

In 2012, Jukka Lehtosalo, a Ph.D. student at that time, started a side project that became known as Mypy.

Mypy was originally envisioned as a Python variant with seamless dynamic and static typing. Initially, it started as a separate language. Then, it was re-written as an external library, compatible with Python.

First, install the library:

$ pip install mypy

Then, we write the following script to demonstrate how Mypy works:

Finally, run the script using Mypy. You should see the following output:

Let’s create an error on purpose to check the behavior of Mypy. Change the type of variable age from int to str and run the script. You should see the following:

Mypy has successfully detected that age should be of type str. Feel free to conduct your experiments and check how Mypy works.

Advanced Annotations

Until now, we have seen the basics of how type annotations work in Python. The Python Typing module contains numerous types of annotations that are not commonly used. Let’s look at a few of them:

Callable

In Python, Callables can be both functions and classes (if they implement the __call__ method).

Here, we will focus on functions. We use the Callable hint from Typing module when one function is an argument of another. Let’s see an example:

We get the following output:

Union

We use Union when a variable can take more than one type.

In the previous example, the square function is annotated to take only integers. In reality, we can calculate the square of floats as well. We can add this functionality using the Union annotation:

The output is the same as before. The square function now can take either an integer or a float.

In Python 3.10 (PEP 604), we can use the pipe operator instead of Union. We can rewrite the square function as follows:

None

What about functions that have no return value? In statically typed languages, we typically use specific keywords. For example, in Java, we use the void keyword.

In Python, we can use the conventional None keyword:

Any

On the other hand, we might want to leave our variable unconstrained — compatible with every type. We can do this using the Any annotation:

Most developers avoid using the Any annotation because it does not contribute any value — we may as well skip the annotation altogether.

We will later see a better alternative.

Sequence

In some cases, our function could expect some type of sequence, and not really care whether the input is a list or a tuple. In general, a Sequence is anything that implements the __getitem__() method.

Let’s see an example of how we can use the Sequence annotation from the Typing module:

We get the following output:

We can also use Mypy to verify the type checking:

TypeVar — Introducing Generics

In the previous script, our sequences work with any type variable, because we used the Any annotation. However, this makes our code a bit ambiguous.

In many cases, we would like to put a constraint on the type of variables that a function could accept.

Let’s say we want our iteration function in script2.py to accept sequences with only str or int values. We can do that using the TypeVar variable — a type variable that let us declare a generic type:

We declare a new type variable called numeric_var that accepts both strings and integers. Hence, the iteration function now accepts Sequences that contain either integers or strings, but not both.

We also use the reveal_type function of Mypy. This function tells us how Mypy interprets our type hints.

Let’s run the script using Mypy. We get the following output:

Mypy found 2 errors, which we expected. The problem was with the l2 and t2 variables. Specifically, the l2 list contained floats, while the t2 tuple contained both integers and strings.

To fix that, comment out the t2 and l2 variables and run again:

Now, type checking does not find any error. Mypy found two variables, a List containing integers and a Tuple with strings.

Changes in Python 3.9

From Python 3.9 onwards (PEP 585 specifically) some classes like tuple and list are now generic types. Hence, using the type classes themselves instead of the Typing module is now preferred. For example:

Annotation syntax until Python 3.8 using the Typing module (also works in Python 3.9+):

In Python 3.9+ we can also write:

There is a workaround to run the above code in Python 3.8 if we do this import:

Personally, I still use the annotation syntax with the Typing module because many projects still do so. Compatibility comes first.

Back to our example

Now, we are ready to add annotations to our initial annotation_demo script. The script then becomes:

Essentially, I changed two things:

  1. The preprocessing function was modified to accept annotated arguments and return an annotated value as well.
  2. The reviews and reviews_cleaned variables were properly annotated as pandas data frames and pandas series respectively.

Closing Remarks

Typing hints in Python provide an excellent way to write clean and readable code. Remember the Zen of Pythonreadability counts!

With annotations, Python adopts a statically typing flavor, similar to other popular languages.

Also, Python has numerous other annotation types that we didn’t cover in this article.

Personally, I don’t use all of them. However, I always use function/method annotations: The other developers should be able to immediately understand your function/method signatures without additional effort!

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: