Original Source Here
Data Science: Write Robust Python With Static Typing
Updated for Python 3.10
There are two types of programming languages: Statically typed and dynamically typed languages.
Python is a dynamically typed language. You don’t have to explicitly specify the data type of variables. The same is true for functions: You don’t have to specify the type of arguments or the function’s return type.
In other words, there is no code compilation — the Python interpreter performs type checking at runtime — during code execution.
On the other hand, in statically typed languages (such as Java), it is mandatory to declare the data type of every variable. Thus, type checking is completed at compile time. That’s why statically typed languages are typically faster, but more verbose.
With PEP 3107 and PEP 484, we can use type annotations in Python to annotate data types. However, Python will remain a dynamic language. The role of types hints is to help you write clean and robust code.
There are two main types of annotations:
- Variable Annotations
- Function Annotations
But first, let’s demonstrate a case that shows why type annotations are beneficial.
Let’s say we have a simple pandas dataframe with movie reviews. We want to perform some basic preprocessing and then store the result in a new variable:
reviews_cleaned variable is now a series instead of a dataframe. It’s easy to miss that because the
reviews dataframe has a single column.
Let’s see if we can use type annotations to make our Python code more readable.
We use them when we declare a variable. After the variable, add a semicolon followed by a space (PEP standard) and the type of a variable: For example:
We can also annotate non-primitive variables, such as Lists, Tuples, and Dictionaries:
Of course, we can annotate data-science-related modules such as Pandas and NumPy:
Note: If you import pandas and numpy with aliases other than
np, change the annotations accordingly.
We can also annotate function/method arguments and their return types. Let’s see an example:
sum_2_numbers calculates the summation of two numbers. However, because the
+ operator is overloaded, it behaves differently: For integers/floats, the function calculates their summation. For strings, the function
sum_2_numbers outputs their concatenation.
We can make the function
sum_2_numbers more clear by adding type annotations for arguments and the return value. We use the following format (bold shows the annotations we add):
function_name (argument_name: argument_type) -> return_type:
Using annotations, the function
sum_2_numbers then becomes:
The function takes as input two integers and outputs an integer.
Note: The user can still pass float and string datatypes — the annotations are just hints and not a requirement.
However, there is a tool we can use to force type checks.
Enter Mypy: Force Type Checking at Runtime
In 2012, Jukka Lehtosalo, a Ph.D. student at that time, started a side project that became known as Mypy.
Mypy was originally envisioned as a Python variant with seamless dynamic and static typing. Initially, it started as a separate language. Then, it was re-written as an external library, compatible with Python.
First, install the library:
$ pip install mypy
Then, we write the following script to demonstrate how Mypy works:
Finally, run the script using Mypy. You should see the following output:
Let’s create an error on purpose to check the behavior of Mypy. Change the type of variable
age from int to str and run the script. You should see the following:
Mypy has successfully detected that
age should be of type str. Feel free to conduct your experiments and check how Mypy works.
Until now, we have seen the basics of how type annotations work in Python. The Python
Typing module contains numerous types of annotations that are not commonly used. Let’s look at a few of them:
In Python, Callables can be both functions and classes (if they implement the
Here, we will focus on functions. We use the Callable hint from
Typing module when one function is an argument of another. Let’s see an example:
We get the following output:
We use Union when a variable can take more than one type.
In the previous example, the
square function is annotated to take only integers. In reality, we can calculate the square of floats as well. We can add this functionality using the Union annotation:
The output is the same as before. The
square function now can take either an integer or a float.
In Python 3.10 (PEP 604), we can use the pipe operator instead of Union. We can rewrite the
square function as follows:
What about functions that have no return value? In statically typed languages, we typically use specific keywords. For example, in Java, we use the void keyword.
In Python, we can use the conventional None keyword:
On the other hand, we might want to leave our variable unconstrained — compatible with every type. We can do this using the Any annotation:
Most developers avoid using the Any annotation because it does not contribute any value — we may as well skip the annotation altogether.
We will later see a better alternative.
In some cases, our function could expect some type of sequence, and not really care whether the input is a list or a tuple. In general, a Sequence is anything that implements the
Let’s see an example of how we can use the Sequence annotation from the
We get the following output:
We can also use Mypy to verify the type checking:
TypeVar — Introducing Generics
In the previous script, our sequences work with any type variable, because we used the Any annotation. However, this makes our code a bit ambiguous.
In many cases, we would like to put a constraint on the type of variables that a function could accept.
Let’s say we want our
iteration function in script2.py to accept sequences with only str or int values. We can do that using the
TypeVar variable — a type variable that let us declare a generic type:
We declare a new type variable called
numeric_var that accepts both strings and integers. Hence, the
iteration function now accepts Sequences that contain either integers or strings, but not both.
We also use the
reveal_type function of Mypy. This function tells us how Mypy interprets our type hints.
Let’s run the script using Mypy. We get the following output:
Mypy found 2 errors, which we expected. The problem was with the
t2 variables. Specifically, the
l2 list contained floats, while the
t2 tuple contained both integers and strings.
To fix that, comment out the
l2 variables and run again:
Now, type checking does not find any error. Mypy found two variables, a List containing integers and a Tuple with strings.
Changes in Python 3.9
From Python 3.9 onwards (PEP 585 specifically) some classes like
list are now generic types. Hence, using the type classes themselves instead of the
Typing module is now preferred. For example:
Annotation syntax until Python 3.8 using the
Typing module (also works in Python 3.9+):
In Python 3.9+ we can also write:
There is a workaround to run the above code in Python 3.8 if we do this import:
Personally, I still use the annotation syntax with the
Typing module because many projects still do so. Compatibility comes first.
Back to our example
Now, we are ready to add annotations to our initial
annotation_demo script. The script then becomes:
Essentially, I changed two things:
preprocessingfunction was modified to accept annotated arguments and return an annotated value as well.
reviews_cleanedvariables were properly annotated as pandas data frames and pandas series respectively.
Typing hints in Python provide an excellent way to write clean and readable code. Remember the Zen of Python — readability counts!
With annotations, Python adopts a statically typing flavor, similar to other popular languages.
Also, Python has numerous other annotation types that we didn’t cover in this article.
Personally, I don’t use all of them. However, I always use function/method annotations: The other developers should be able to immediately understand your function/method signatures without additional effort!
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot