Graph Neural Networks: a learning journey since 2008 — Python & Deep Walk



Original Source Here

Graph Neural Networks: a learning journey since 2008 — Python & Deep Walk

The fourth part of this series. Today, the practical implementation of DeepWalk 🐍 and a look at Facebook Large Page Dataset 👍

Image by Francesca Hotchin on Unsplash

Welcome back to the fourth part of our journey in the Graph Machine Learning world. In the previous post, we saw how DeepWalk algorithm works and its mathematical background. Today we’ll dive into the Python implementation of the algorithm, starting from a simple example (the Karate club) with a simple DeepWalk version in Numpy and then we’ll jump to a more interesting scenario, with Facebook Large Page Dataset.

Here the list of the previous episodes:

Concatenating all DeepWalk mathematical steps

Generate a corpus of “sequences”

Now that we know how DeepWalk works mathematically, let’s see together step by step what is going to happen for the Karate club, to fix our ideas before jumping to bigger applications. The current application is based on a simple numpy application for an embedding neural network. Codes and input files can be found here:

The input dataset is data/karate.adjlist , an adjacency list for the Karate club, graph.py is a utility to read the input file and karate.py is our main DeepWalk algorithm file. The calculations are far from being correct, but they are a great way to concretely understand what is going on.

Fig.1 Karate Club graph. The club is composed of 34 members and we can define 78 interactions across each other.

In this example we have 34 nodes and 78 edges. The number of random walk is number_walks=5for each node, thus a total of 170 random walks, that explore a walk_length=5 vertices from a starting one. On line 378 we are creating the initial corpus:

Fig.2: generation of a corpus of “sentences” which are random walks of length 5

From here we’ll have a list of random walks:

Fig.3: Example of random walks of length 5 which started from vertices 22, 17, 13, 18, 11 and 15.

The Skipgram Model

Once the corpus has been generated we can go on with the SkipGram model. In this example, I have not implemented the Hierarchical Softmax, but I am just using a simple softmax layer, as the problem is quite small. These are the steps that we need:

  • Generation of training data: for each input sequence we need to generate a pair (X, y) for the training stage. X is the current vertex, while y is the word within a window_size from X
  • Parameters initialisation: Randomly initialise a matrix of word embeddings and implement the forward step for the 2-hidden layers neural network
  • Neural Networks: A function to update weights and return the training cost, to converge towards the right embeddings.

Generation of Training Data

The training data underline the SkipGram model goal: X is an input word and the model predicts the likelihood of a word at max window_size distance to be neighbour of X . The final training dataset will be a set of pairs (X, y) . For example, given the input sequence 22, 1, 13, 4, 3 and wind_size=5 the couples (X,y) will be (for example): (22, 1), (22, 13), (22, 4), (22, 3), (1, 22), (1, 13), (1, 4), (1, 3), (13, 22), (13, 1), (13, 4) etc. The function generate_training_data accomplishes this goal, looping through all the input random_walks and its elements, selecting left and right indexes idxs_left and idxs_right

Fig.4: Generate training data function. The pairs (X,y) for Skipgram model are generated by iterating through all the random walks and their elements, selecting at each time the left and right indices with respect to the starting vertex i

The final vectors X and y sizes is (1, 3400) where 3400comes from the total number of random walks ( 170 ) times the number of couples for each random walk, which is 20 .

Parameters Initialisation

The second step is to initialise the following parameters:

  • representation_size This is the latent feature dimensions, namely how many latent “coordinates” SkipGram should return. In this example, this is fixed to 2
  • vocab_size , vocabulary size, which is the total number of vertices in the graph ( vocab_size=34+1=35 )
  • One-hot encoding matrix for the target y . The hot-encoded matrix size is 35, 3400 and it has 1 every time the target word appears:
Fig.5: Convert the target vector to a hot-encoded matrix, whose size comes form the vocabulary size (+1) and the target size.
  • epochs for the SkipGram model. Remember that SkipGram is a Neural Network, so we have to specify how many times we want to cycle through the entire dataset (in this case epochs=2 )
  • batch_size , how many chunks of data should be use at each iteration for the neural network for training ( herebatch_size=4096 as I want to process the full dataset immediately)
  • learning_rate , set to 0.001
  • Embedding and weight matrices for the neural network, initialised through the function param_init In this function we create a random matrix for the word embeddings, whose size is the vocab_size and representation_size — namely for each node we’ll have representation_size coordinates. Secondly, neural network weights are randomly initialised as: np.random.randn(out_size, inp_size) where inp_size is the number of graph vertices and out_size is the representation_size .

Neural Networks

Arrived at this point we can spin up the embedding neural network with the following steps:

  • Define a chunk of batch_size from input data
  • Run the forward propagation part of the neural network ( forward_propagation )
  • Compute the gradients through back propagation ( backward_propagation )
  • Update parameters, embedding matrix and weight matrix ( update_parameters )
  • Compute the cost through the cross entropy ( cross_entropy )

This numpy implementation of SkipGram can help us in understanding what’s hidden in the main DeepWalk.

Fig.5: For each epoch we derive chunks from the training dataset, we perform the forward propagation, back propagation and we update all the parameters.

At first, forward_propagation retrieve the embeddings for each word, node_to_embedding returning the initial random embedding representation for all the 3400 X input of the training dataset. Then, a call to linear_dense function is done, where the neural network weights are update:

Fig.6: The core of the neural network computation. At each forward iteration the word embeddings are dot-multiplied with network weights

The core neural network just take as input the word embeddings which are multiply by the neural network weights: np.dot(weights, node_vector) The final step is to compute the softmax layer for the previous product through softmax function. Fig.7 shows how the input word embeddings change through the network computations.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: