Original Source Here
Stop Using Print! Python Logging for Data Scientists
~80% of what you need to know about logging in under 5 mins
There comes a time in every production Data Science project when the code base has become complex, and a refactor is necessary to maintain your sanity. Perhaps you want to abstract out commonly used code into Python modules with classes and functions so that it can be reused with a single line instead of copy-pasting the whole block of code multiple times in your project. Whatever your reason, writing informative logging into your program is critical to ensure you can track its operation and troubleshoot it when things inevitably go wrong.
In this article, I’ll share ~80% of the python logging functionality I’ve ever needed to know as a data scientist. With that knowledge, we can achieve the below two requirements:
- Log some messages to the terminal: Eg., logging program execution steps.
- Simultaneously log some other messages to a file: Eg., Logging results during model training and testing.
The entire code is available on my Github account
Python Logging Module
We will use the following functionality from the Python Logging Module [Link] to solve our two requirements.
basicConfig function, as its name suggests, is used to set basic configuration for the logging system. I find it advantageous to specify the below three arguments while setting the basicConfig
level: indicates the minimum level of message to log. The values of the different logging levels are shown in the table below. For example, if you set
level=logging.INFO, any message logged as
DEBUGwould not appear in your log, while any message logged as
INFOor above will appear in your log.
format: is the format in which the log messages appear. I like my log messages to have the time (
asctime), the name of the level (
levelname) and the actual log message (
message). Thus I specify
format='%(asctime)s %(levelname)s: %(message)s'
datefmt: is the format in which the time appears. I like my log messages to have a complete DateTime, so I specify
datefmt='%Y-%m-%d %H:%M:%S'to log the time in Year-Month-Date Hour:Minute:Second format.
Now that we’ve set up the basic configuration, we can instantiate a logger object using a common name in all the
.py files we want the logger to work in. I find it advantageous to store this common name in an external
constants.py file, which I can then import into each of the
.py files that I want to use the same logger in.
So far, I haven’t mentioned writing any log message to a file. Thus all our logging messages would display only on the terminal. Since our second requirement is to log certain messages to a file, we will use the
FileHandler together with a custom logging level that I call
METRICS to achieve this in just five lines of code! Just ensure to make this custom logging level larger than the level
CRITICAL to ensure that no other logging messages get written to the file.
Putting It All Together
The above three concepts are basically all we need to know to set up awesome logging functionality in our code and meet our two requirements. The below three
.py files shows how all this works together.
constants.py: This file is only used to define a couple of constants. One for the custom logging level
METRICSand another for the common
LOGGER_NAMEso that we can use them across multiple other
main.py: This is our main python program. Notice how I’ve used the concepts described above to setup the basic configuration, instantiate the logger and create the file handler to route only the logging messages with the custom
METRICSlevel to a file
metrics.log. To actually log a message using our awesome logger, we would call
logger.infoif we want to log a message with
logger.logif we want to log a message with our custom
METRICSlevel as shown in the code below.
test_print.py: This file shows how to instantiate the same logger in another
.pyfile by using the same
LOGGER_NAME. This will route any
METRICScustom logs to the same
The ability to log some messages to the terminal and some other messages to a file is convenient for data science programs. I routinely log program execution steps to track the program’s progress to the terminal and model training, testing results to a file in my day-to-day work. If you use MLFLOW, you could even add this log file to your MLFLOW server using
mlflow.log_artifact('metrics.log') to track historical progress!
I hope you found this article on Python logging useful. You can also access the full code on my Github account. Thanks for reading!
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot