Top 30 GitHub Python Projects At The Beginning Of 2022



Original Source Here

Top 30 GitHub Python Projects At The Beginning Of 2022

Repositories with the most stars

Happy new year 2022! As the first post in the new year, I am very curious about what was the most popular Python projects so far. GitHub is definitely the best suitable place to have these statistics. Although not all the open-sourced projects will be maintained here, there won’t be any other single place that has this capability.

This rank is meant to be easy because I’ll share my code. Now, let’s have a look at how we can get the ranked list from GitHub API with a few lines of code. After that, I’ll categories these projects using my terminology and then add some short introductions to them.

The Top 30 GitHub projects are categories as follows:

  • 7 repos that improve productivity
  • 3 repos that are programming frameworks
  • 5 repos that facilitate machine learning
  • 4 repos that facilitate real life
  • 6 repos that collect and organise useful information
  • 5 repos that teach some subjects

GitHub Search API

Image by Arek Socha from Pixabay

The official API documentation can be found at this page: https://docs.github.com/en/rest/reference/search#search-repositories

So, I won’t repeat the details such as the parameters in this article. If you are interested in what else we can do, please refer to that page.

The most beautiful thing is that we don’t need to register or apply for an API key for using this endpoint. Of course, it has a rate limit which is up to 10 requests per minute, but it will be quite enough for us to test the code and pull the ranked list.

First of all, we need to use the requests module of Python. It is built-in and I believe most of you should be familiar with it. Then, we need Pandas to do some transformation of the data.

import requests
import pandas as pd

The URL is https://api.github.com/search/repositories based on the API documentation. Since we are only interested in Python-based projects, so we need to put the argument language:python in the query. Then, we want to sort the search results by the number of stars and order by descent.

url = 'https://api.github.com/search/repositories?q=language:python&sort=stars&order=desc'

Then, we can use the requests module to call this API endpoint. We should use the GET method. Then, we can convert the results to a Python dictionary.

res = requests.get(url)
res_dict = res.json()

All the search results will be in an array with the key “items”. So, we can get all the repo information as follows. The default page size is 30, so we will have 30 repos. They are exactly the Top 30 Python Repos 🙂

repos = res_dict['items']
len(repos)

There is some other information in the result dictionary. If we remove the item array, we can see it indicates that the search results have more pages and there are a total of 8,046,758 Python Repos by the time I fired the request.

Now, let’s convert the items array into Pandas dataframe.

repo_df = pd.DataFrame(repos)

Then, I want to remove all the columns that are not interested. I’ll also add one more column called year_on_github to catch how many years that this project had been created on GitHub.

repo_df = repo_df[['name', 'full_name', 'html_url', 'created_at', 'stargazers_count', 'watchers', 'forks', 'open_issues']]
repo_df['created_at'] = pd.to_datetime(repo_df['created_at'])
repo_df['created_year'] = repo_df['created_at'].dt.year
repo_df['years_on_github'] = 2022 - repo_df['created_at'].dt.year

Here is the full list of the Top 30 Repos:

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: