How Does Python Garbage Collection Work?



Original Source Here

How Does Python Garbage Collection Work?

Keep your Python objects being referenced, or they will be released in memory.

Every programming language will have its particular mechanisms for garbage collection. This is referring to those unused variables which still occupy some space in the memory that will be eventually removed. This is important in terms of utilising the memory space more efficiently.

Have you ever thought about how the Python garbage collection works? Particularly, how does Python know an object becomes unuseful? In this article, I’ll demonstrate this mechanism. Some built-in functions will be utilised such as the id() and getrefcount().

Show Memory Address

Image by Quang Le from Pixabay

Before we can continue with the garbage collection mechanism, it is necessary to build the concept of memory addresses. Don’t worry, it doesn’t have to be deep dive. I’ll demonstrate using the id() function, and that will be enough.

Firstly, let’s define two Python lists. They can be exactly the same in terms of the content.

a = [1, 2, 3]
b = [1, 2, 3]

Apparently, variables a and b are the same. However, does that mean these two variables are pointing to the same memory address? No. Let’s verify it.

id(a)
id(b)

The id() function will give us the “identity” of an object, which is indicated by an integer. As shown, the integers are different. So, variables a and b are pointing to different memory addresses although they are the same at the moment.

If we create another variable a1 and let a1 = a, there is no new object created. Instead, a1 will point to the same memory address as of a.

That makes sense, that’s why when we change a, a1 will also be updated.

The Reference Count

Image by Son Hoa Nguyen from Pixabay

Now we can come to the most important concept — reference count.

Basically, the reference count in Python indicates the number of references to a certain object. It is important because the garbage collection mechanism relies on the reference count to decide whether the object should be retained or released in the memory.

That is, when the reference count for an object equals zero, it will be released. Very intuitively and reasonably, when there is nothing reference to an object, it means that the object is abandoned and useless.

How we can get the reference count then? In fact, it could be designed to be an internal mechanism that doesn’t simply reveal to the developer. However, Python actually provided a built-in function called getrefcount() in the sys module that can easily query the reference count of an object.

To use this function, we need to import it from the sys module. This is built-in to any version of Python 3, so you don’t need to download or install anything to be able to use it.

from sys import getrefcount

Then, let’s use this function to query the reference count.

a = [1, 2, 3]
print(getrefcount(a))

In this example, I have created a variable a and assign a simple Python list to it. Then, the getrefcount() function shows that the reference count of this object is 2.

But hold on, why it is 2? Please have a look at the graph below.

In fact, when we use the getrefcount() function to query the reference count of an object, the function has to establish the reference to the object. That’s why the reference count is 2. It indicates that both the variable a and the function getrefcount() are referencing the list [1, 2, 3].

What will increase the reference count?

Image by S. Hermann & F. Richter from Pixabay

Now we have understood the reference count and how to query the reference count of an object, but what will cause the reference count to change? The following actions will increase the reference count by 1.

1. The object is created and assigned to a variable.

This has been demonstrated in the previous section already. When we created the Python list object [1, 2, 3] and assign it to the variable a, the reference count of the list object [1, 2, 3] was set to 1.

2. The object is assigned to one more variable.

When the object is assigned to another variable, the reference count will be added by 1. However, please be careful that this doesn’t mean the follows.

a = [1, 2, 3]
b = [1, 2, 3] # This will NOT increase the reference count

This has been discussed in section 1. Although the lists are the same, they are different objects. To increase the reference count, we can do the following.

a = [1, 2, 3]
b = a

3. The object is passed in a function as an argument.

This is exactly the case when we use the function getrefcount(a). The variable a was passed into the function as an argument, so that it will definitely be referenced.

4. An object is appended into a container type.

A container type can be a list, a dictionary or a tuple, such as the following example.

my_list = [a]

What will reduce the reference count?

Image by Michael Schwarzenberger from Pixabay

Now, let’s have a look at the scenarios that will reduce the reference count.

1. The object has been removed from the scope of a function. This usually happens when a function finished the execution.

We can verify this if we try to print the reference count during the executing of a function. So, we can design the experiment as follows.

def my_func(var):
print('Function executing: ', getrefcount(var))
my_func(a)
print('Function executed', getrefcount(a))

But why the reference count is 4 rather than 3? This involves another concept of Python, which is the “Call Stack”.

When a function is called in Python, a new frame is pushed onto the call stack for its local execution, and every time a function call returns, its frame is popped off the call stack.

This concept will not be expanded in this article because it is out of scope. If you are not familiar with the call stack, what I can tell is that the error message you’ve seen with the traceback and the line number is exactly from the call stack.

Therefore, the reference count is 4 during my_func() was executing. After it has been executed, the reference count was reduced back to 2.

2. When a variable that references the object is deleted.

This is very easy to understand. When we use the del command to delete the variable, the variable will no longer reference to the object.

Please be noticed that if we delete the variable a in this case, the reference count of the object will become 0. That is exactly the scenario that the garbage collection will release this object. However, that also means we can no longer use the getrefcount() function to check the reference count.

3. When a variable that references the object is assigned with another object.

This case will probably happen more often. When a variable is assigned with another object, the reference count of the current object will be reduced by one. Of course, the reference count of the new object will be increased.

4. When the object is removed from a container.

When the object is appended in a container, the reference count will +1. On the opposite, when it is removed, the reference count will -1.

Of course, if we delete the container, the reference count will also be reduced.

A Special Case

Please be noticed that only the general objects can be investigated in this way. That means we do have special cases when the value is a literal constant such as a number 123 or a string 'abc'.

The reference count can be unexpected as shown above. In my case, I’m using Google Colab so that the environment could be shared and that caused such large reference counts.

Another important factor worth mentioning is that the literal constants are guaranteed to be at the same memory location.

Therefore, as long as the number 123 is used somewhere, the reference count could be increased. Even though we have only 1 variable referencing it, the reference count might also be more.

Summary

Image by Innova Labs from Pixabay

In this article, I have introduced the garbage collection mechanism in Python. That is the reference count of the objects.

The following actions will increase the reference count of an object:

  1. The object is created and assigned to a variable.
  2. The object is assigned to one more variable.
  3. The object is passed in a function as an argument.
  4. An object is appended into a container type.

On the contrary, the following actions will reduce the reference count of an object:

  1. The object has been removed from the scope of a function. This usually happens when a function finished the execution.
  2. When a variable that references the object is deleted.
  3. When a variable that references the object is assigned with another object.
  4. When the object is removed from a container.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: