https://miro.medium.com/max/1200/0*rbiDe7yU1Za0dq1X
Original Source Here
Data Science: Write Robust Python With Static Typing
Updated for Python 3.10
There are two types of programming languages: Statically typed and dynamically typed languages.
Python is a dynamically typed language. You don’t have to explicitly specify the data type of variables. The same is true for functions: You don’t have to specify the type of arguments or the function’s return type.
In other words, there is no code compilation — the Python interpreter performs type checking at runtime — during code execution.
On the other hand, in statically typed languages (such as Java), it is mandatory to declare the data type of every variable. Thus, type checking is completed at compile time. That’s why statically typed languages are typically faster, but more verbose.
With PEP 3107 and PEP 484, we can use type annotations in Python to annotate data types. However, Python will remain a dynamic language. The role of types hints is to help you write clean and robust code.
There are two main types of annotations:
- Variable Annotations
- Function Annotations
But first, let’s demonstrate a case that shows why type annotations are beneficial.
Mini Example
Let’s say we have a simple pandas dataframe with movie reviews. We want to perform some basic preprocessing and then store the result in a new variable:
The reviews_cleaned
variable is now a series instead of a dataframe. It’s easy to miss that because the reviews
dataframe has a single column.
Let’s see if we can use type annotations to make our Python code more readable.
Variable annotations
We use them when we declare a variable. After the variable, add a semicolon followed by a space (PEP standard) and the type of a variable: For example:
We can also annotate non-primitive variables, such as Lists, Tuples, and Dictionaries:
Of course, we can annotate data-science-related modules such as Pandas and NumPy:
Note: If you import pandas and numpy with aliases other than pd
and np
, change the annotations accordingly.
Function annotations
We can also annotate function/method arguments and their return types. Let’s see an example:
The function sum_2_numbers
calculates the summation of two numbers. However, because the +
operator is overloaded, it behaves differently: For integers/floats, the function calculates their summation. For strings, the function sum_2_numbers
outputs their concatenation.
We can make the function sum_2_numbers
more clear by adding type annotations for arguments and the return value. We use the following format (bold shows the annotations we add):
function_name (argument_name: argument_type) -> return_type:
Using annotations, the function sum_2_numbers
then becomes:
The function takes as input two integers and outputs an integer.
Note: The user can still pass float and string datatypes — the annotations are just hints and not a requirement.
However, there is a tool we can use to force type checks.
Enter Mypy: Force Type Checking at Runtime
In 2012, Jukka Lehtosalo, a Ph.D. student at that time, started a side project that became known as Mypy.
Mypy was originally envisioned as a Python variant with seamless dynamic and static typing. Initially, it started as a separate language. Then, it was re-written as an external library, compatible with Python.
First, install the library:
$ pip install mypy
Then, we write the following script to demonstrate how Mypy works:
Finally, run the script using Mypy. You should see the following output:
Let’s create an error on purpose to check the behavior of Mypy. Change the type of variable age
from int to str and run the script. You should see the following:
Mypy has successfully detected that age
should be of type str. Feel free to conduct your experiments and check how Mypy works.
Advanced Annotations
Until now, we have seen the basics of how type annotations work in Python. The Python Typing
module contains numerous types of annotations that are not commonly used. Let’s look at a few of them:
Callable
In Python, Callables can be both functions and classes (if they implement the __call__
method).
Here, we will focus on functions. We use the Callable hint from Typing
module when one function is an argument of another. Let’s see an example:
We get the following output:
Union
We use Union when a variable can take more than one type.
In the previous example, the square
function is annotated to take only integers. In reality, we can calculate the square of floats as well. We can add this functionality using the Union annotation:
The output is the same as before. The square
function now can take either an integer or a float.
In Python 3.10 (PEP 604), we can use the pipe operator instead of Union. We can rewrite the square
function as follows:
None
What about functions that have no return value? In statically typed languages, we typically use specific keywords. For example, in Java, we use the void keyword.
In Python, we can use the conventional None keyword:
Any
On the other hand, we might want to leave our variable unconstrained — compatible with every type. We can do this using the Any annotation:
Most developers avoid using the Any annotation because it does not contribute any value — we may as well skip the annotation altogether.
We will later see a better alternative.
Sequence
In some cases, our function could expect some type of sequence, and not really care whether the input is a list or a tuple. In general, a Sequence is anything that implements the __getitem__()
method.
Let’s see an example of how we can use the Sequence annotation from the Typing
module:
We get the following output:
We can also use Mypy to verify the type checking:
TypeVar — Introducing Generics
In the previous script, our sequences work with any type variable, because we used the Any annotation. However, this makes our code a bit ambiguous.
In many cases, we would like to put a constraint on the type of variables that a function could accept.
Let’s say we want our iteration
function in script2.py to accept sequences with only str or int values. We can do that using the TypeVar
variable — a type variable that let us declare a generic type:
We declare a new type variable called numeric_var
that accepts both strings and integers. Hence, the iteration
function now accepts Sequences that contain either integers or strings, but not both.
We also use the reveal_type
function of Mypy. This function tells us how Mypy interprets our type hints.
Let’s run the script using Mypy. We get the following output:
Mypy found 2 errors, which we expected. The problem was with the l2
and t2
variables. Specifically, the l2
list contained floats, while the t2
tuple contained both integers and strings.
To fix that, comment out the t2
and l2
variables and run again:
Now, type checking does not find any error. Mypy found two variables, a List containing integers and a Tuple with strings.
Changes in Python 3.9
From Python 3.9 onwards (PEP 585 specifically) some classes like tuple
and list
are now generic types. Hence, using the type classes themselves instead of the Typing
module is now preferred. For example:
Annotation syntax until Python 3.8 using the Typing
module (also works in Python 3.9+):
In Python 3.9+ we can also write:
There is a workaround to run the above code in Python 3.8 if we do this import:
Personally, I still use the annotation syntax with the Typing
module because many projects still do so. Compatibility comes first.
Back to our example
Now, we are ready to add annotations to our initial annotation_demo
script. The script then becomes:
Essentially, I changed two things:
- The
preprocessing
function was modified to accept annotated arguments and return an annotated value as well. - The
reviews
andreviews_cleaned
variables were properly annotated as pandas data frames and pandas series respectively.
Closing Remarks
Typing hints in Python provide an excellent way to write clean and readable code. Remember the Zen of Python — readability counts!
With annotations, Python adopts a statically typing flavor, similar to other popular languages.
Also, Python has numerous other annotation types that we didn’t cover in this article.
Personally, I don’t use all of them. However, I always use function/method annotations: The other developers should be able to immediately understand your function/method signatures without additional effort!
AI/ML
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot