Original Source Here

# Graph Neural Networks: a learning journey since 2008 — Python & Deep Walk

## The fourth part of this series. Today, the practical implementation of DeepWalk 🐍 and a look at Facebook Large Page Dataset 👍

Welcome back to the fourth part of our journey in the Graph Machine Learning world. In the previous post, we saw how DeepWalk algorithm works and its mathematical background. Today we’ll dive into the Python implementation of the algorithm, starting from a simple example (the Karate club) with a simple DeepWalk version in `Numpy`

and then we’ll jump to a more interesting scenario, with Facebook Large Page Dataset.

Here the list of the previous episodes:

# Concatenating all DeepWalk mathematical steps

## Generate a corpus of “sequences”

Now that we know how DeepWalk works mathematically, let’s see together step by step what is going to happen for the Karate club, to fix our ideas before jumping to bigger applications. The current application is based on a simple `numpy`

application for an embedding neural network. Codes and input files can be found here:

The input dataset is `data/karate.adjlist`

, an adjacency list for the Karate club, `graph.py`

is a utility to read the input file and `karate.py`

is our main DeepWalk algorithm file. The calculations are far from being correct, but they are a great way to concretely understand what is going on.

In this example we have 34 nodes and 78 edges. The number of random walk is `number_walks=5`

for each node, thus a total of 170 random walks, that explore a `walk_length=5`

vertices from a starting one. On line 378 we are creating the initial corpus:

From here we’ll have a list of random walks:

## The Skipgram Model

Once the corpus has been generated we can go on with the SkipGram model. In this example, I have not implemented the Hierarchical Softmax, but I am just using a simple softmax layer, as the problem is quite small. These are the steps that we need:

*Generation of training data*: for each input sequence we need to generate a pair*(X, y)*for the training stage.*X*is the current vertex, while*y*is the word within a`window_size`

from*X**Parameters initialisation:*Randomly initialise a matrix of word embeddings and implement the forward step for the 2-hidden layers neural network*Neural Networks:*A function to update weights and return the training cost, to converge towards the right embeddings.

## Generation of Training Data

The training data underline the SkipGram model goal: `X`

is an input word and the model predicts the likelihood of a word at max `window_size`

distance to be neighbour of `X`

. The final training dataset will be a set of pairs `(X, y)`

. For example, given the input sequence `22, 1, 13, 4, 3`

and `wind_size=5`

the couples `(X,y)`

will be (for example): `(22, 1), (22, 13), (22, 4), (22, 3), (1, 22), (1, 13), (1, 4), (1, 3), (13, 22), (13, 1), (13, 4)`

etc. The function `generate_training_data`

accomplishes this goal, looping through all the input `random_walks`

and its elements, selecting left and right indexes `idxs_left`

and `idxs_right`

The final vectors `X`

and `y`

sizes is `(1, 3400)`

where `3400`

comes from the total number of random walks ( `170`

) times the number of couples for each random walk, which is `20`

.

## Parameters Initialisation

The second step is to initialise the following parameters:

`representation_size`

This is the latent feature dimensions, namely how many latent “coordinates” SkipGram should return. In this example, this is fixed to 2`vocab_size`

, vocabulary size, which is the total number of vertices in the graph (`vocab_size=34+1=35`

)- One-hot encoding matrix for the target
`y`

. The hot-encoded matrix size is`35, 3400`

and it has 1 every time the target word appears:

`epochs`

for the SkipGram model. Remember that SkipGram is a Neural Network, so we have to specify how many times we want to cycle through the entire dataset (in this case`epochs=2`

)`batch_size`

, how many chunks of data should be use at each iteration for the neural network for training ( here`batch_size=4096`

as I want to process the full dataset immediately)`learning_rate`

, set to`0.001`

- Embedding and weight matrices for the neural network, initialised through the function
`param_init`

In this function we create a random matrix for the word embeddings, whose size is the`vocab_size`

and`representation_size`

— namely for each node we’ll have`representation_size`

coordinates. Secondly, neural network weights are randomly initialised as:`np.random.randn(out_size, inp_size)`

where`inp_size`

is the number of graph vertices and`out_size`

is the`representation_size`

.

## Neural Networks

Arrived at this point we can spin up the embedding neural network with the following steps:

- Define a chunk of
`batch_size`

from input data - Run the forward propagation part of the neural network (
`forward_propagation`

) - Compute the gradients through back propagation (
`backward_propagation`

) - Update parameters, embedding matrix and weight matrix (
`update_parameters`

) - Compute the cost through the cross entropy (
`cross_entropy`

)

This `numpy`

implementation of SkipGram can help us in understanding what’s hidden in the main DeepWalk.

At first, `forward_propagation`

retrieve the embeddings for each word, `node_to_embedding`

returning the initial random embedding representation for all the 3400 `X`

input of the training dataset. Then, a call to `linear_dense`

function is done, where the neural network weights are update:

The core neural network just take as input the word embeddings which are multiply by the neural network weights: `np.dot(weights, node_vector)`

The final step is to compute the softmax layer for the previous product through `softmax`

function. Fig.7 shows how the input word embeddings change through the network computations.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot