Stop Using Print! Python Logging for Data Scientists

https://miro.medium.com/max/1200/0*ujeyxC3wGjHNzD-L

Original Source Here

Stop Using Print! Python Logging for Data Scientists

~80% of what you need to know about logging in under 5 mins

Photo by Marek Okon on Unsplash

There comes a time in every production Data Science project when the code base has become complex, and a refactor is necessary to maintain your sanity. Perhaps you want to abstract out commonly used code into Python modules with classes and functions so that it can be reused with a single line instead of copy-pasting the whole block of code multiple times in your project. Whatever your reason, writing informative logging into your program is critical to ensure you can track its operation and troubleshoot it when things inevitably go wrong.

In this article, I’ll share ~80% of the python logging functionality I’ve ever needed to know as a data scientist. With that knowledge, we can achieve the below two requirements:

  1. Log some messages to the terminal: Eg., logging program execution steps.
  2. Simultaneously log some other messages to a file: Eg., Logging results during model training and testing.

The entire code is available on my Github account

Python Logging Module

We will use the following functionality from the Python Logging Module [Link] to solve our two requirements.

basicConfig

The basicConfig function, as its name suggests, is used to set basic configuration for the logging system. I find it advantageous to specify the below three arguments while setting the basicConfig

  1. level: indicates the minimum level of message to log. The values of the different logging levels are shown in the table below. For example, if you set level=logging.INFO, any message logged as DEBUG would not appear in your log, while any message logged as INFO or above will appear in your log.
  2. format: is the format in which the log messages appear. I like my log messages to have the time ( asctime ), the name of the level ( levelname ) and the actual log message ( message ). Thus I specify format='%(asctime)s %(levelname)s: %(message)s'
  3. datefmt: is the format in which the time appears. I like my log messages to have a complete DateTime, so I specify datefmt='%Y-%m-%d %H:%M:%S' to log the time in Year-Month-Date Hour:Minute:Second format.
Set up the basic configuration for logging
Python Logging Levels. Image source [Link]

getLogger

Now that we’ve set up the basic configuration, we can instantiate a logger object using a common name in all the .py files we want the logger to work in. I find it advantageous to store this common name in an external constants.yaml or constants.py file, which I can then import into each of the .py files that I want to use the same logger in.

Instantiate a logger object

FileHandler

So far, I haven’t mentioned writing any log message to a file. Thus all our logging messages would display only on the terminal. Since our second requirement is to log certain messages to a file, we will use the FileHandler together with a custom logging level that I call METRICS to achieve this in just five lines of code! Just ensure to make this custom logging level larger than the level CRITICAL to ensure that no other logging messages get written to the file.

Set up the file handler to write certain logs of ‘METRICS’ custom level to a file ‘metrics.log’

Putting It All Together

The above three concepts are basically all we need to know to set up awesome logging functionality in our code and meet our two requirements. The below three .py files shows how all this works together.

  1. constants.py: This file is only used to define a couple of constants. One for the custom logging level METRICS and another for the common LOGGER_NAME so that we can use them across multiple other .py files
  2. main.py: This is our main python program. Notice how I’ve used the concepts described above to setup the basic configuration, instantiate the logger and create the file handler to route only the logging messages with the custom METRICS level to a file metrics.log. To actually log a message using our awesome logger, we would call logger.info if we want to log a message with INFO level or logger.log if we want to log a message with our custom METRICS level as shown in the code below.
  3. test_print.py: This file shows how to instantiate the same logger in another .py file by using the same LOGGER_NAME. This will route any METRICS custom logs to the same metrics.log file.
Define constants
Main program
Test out instantiating and using the same logger that we created in the main script

Conclusion:

The ability to log some messages to the terminal and some other messages to a file is convenient for data science programs. I routinely log program execution steps to track the program’s progress to the terminal and model training, testing results to a file in my day-to-day work. If you use MLFLOW, you could even add this log file to your MLFLOW server using mlflow.log_artifact('metrics.log') to track historical progress!

I hope you found this article on Python logging useful. You can also access the full code on my Github account. Thanks for reading!

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: