A More Intuitive Way To Understand Graph Neural Networks With a Code Example

Author(s): Ruite Xiang Originally published on Towards AI. Sometimes, it is difficult to understand the theory — the math and the formulas- without seeing how it translates into code. Source from the author At least, that is my case, so I put together this post to explain the different concepts in graph neural networks (GNNs) in a way that is more intuitive and beginner-friendly, complemented with a code example. However, when I say beginners, you might still need to know some concepts such as matrix multiplications and Pytorch. What is a graph? Graphs are used to represent a set of connected objects, modeled as nodes connected via edges that represent their relationships. Each node is described with a feature vector, which is what the GNNs will use to make the predictions. Source image from the author For each graph with a set of nodes and edges, we would have a feature matrix (X) and information about how the different nodes are connected. Source image from the author The connectivity data can be represented in different formats such as an adjacency matrix where 1s represent connections and the row or column index is the edge index. In the case of PyTorch Geometric, connectivity is represented in the sparse COO format of shape [2, num_edges] with row 0 being the source nodes and row 1 the destination nodes, as you can see in the image. It is quite a flexible representation since any data where the relationships between the objects are relevant to the prediction task can benefit from the graph representation. Well-known examples include social networks, where people are nodes and relationships are edges, and molecules or drugs, where atoms are nodes and bonds are edges. Most GNNs consist of 3 steps This formula represents the most basic and popular GNN architecture, the graph convolutional networks, but the steps are very much generalizable. This formula is basically how most GNNs work with some modifications. Modified image from PyTorch Geometric H’ᵢ is the updated node embedding. X is the node features before the update. bᵢ is the bias term. W is the learnable weight in a linear layer, for instance. Âij is a normalized adjacency matrix, or it could also be attention scores for each node in the GAT (graph attention networks) architecture. Intuitively, we can think that the different edges are weighted differently. Step 1: Transform In this step, the node features are first transformed by some learnable weight matrix, in the case of a GCN it is just a fully connected layer without the bias (X * W). Source image from the author Step 2: Aggregate We then apply some aggregation operations like the sum operation (∑); other options include mean or max. However, for each node, only features from connecting nodes will be summed. In my example, features from nodes 1 and 2 would be summed to the features from node 0, but for node 1, only features from node 0 would be summed. This is also called the message-passing step since we are passing the information from the neighboring nodes to the source node. This step is achieved by multiplying the normalized adjacency matrix by the transformed weights, which will automatically sum only the connected nodes. The normalization means that each neighbor node will have a different weight when summed. Source image from the author See in the adjacency matrix that the different nodes are also connected to themselves (it is called the self-loop) so during the matrix multiplication its information will also be included. Step 3: Update Using the source node and the neighboring nodes, we update the source node features. After the matrix multiplication, we obtain a new embedding for each node, to which we apply a bias and an activation function to get the updated embeddings. Source image from the author We could say that in this case, the update operation is a sum, however, for other implementations, the update can happen with another learnable function like a linear layer or recurrent neural networks, so another transformation step. Step 4: Repeat or predict We could repeat the process, which means stacking several layers of GCN, or we can pool all the node features into a single feature vector to make some prediction on a property of the graph by adding a final fully connected layer. Similarly, for predictions on specific node or edge properties, we can pool information on neighboring nodes and edges. An example from PyTorch Geometric This is an example implementation of GCN extracted from the official page of PyTorch Geometric, where I added an activation function. Let’s see how the different steps apply here: import torchfrom torch.nn import Linear, Parameter, ReLUfrom torch_geometric.nn import MessagePassingfrom torch_geometric.utils import add_self_loops, degreeclass GCNConv(MessagePassing): def __init__(self, in_channels, out_channels): super().__init__(aggr='add') # "Add" aggregation (Step 5). self.lin = Linear(in_channels, out_channels, bias=False) self.bias = Parameter(torch.empty(out_channels)) self.activation = ReLU() self.reset_parameters() def reset_parameters(self): self.lin.reset_parameters() self.bias.data.zero_() def forward(self, x, edge_index): # x has shape [N, in_channels] # edge_index has shape [2, E] # Step 1: Add self-loops to the adjacency matrix. edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0)) # Step 2: Linearly transform node feature matrix. x = self.lin(x) # Step 3: Compute normalization. row, col = edge_index deg = degree(col, x.size(0), dtype=x.dtype) deg_inv_sqrt = deg.pow(-0.5) deg_inv_sqrt[deg_inv_sqrt == float('inf')] = 0 norm = deg_inv_sqrt[row] * deg_inv_sqrt[col] # Step 4-5: Start propagating messages. out = self.propagate(edge_index, x=x, norm=norm) # Step 6: Apply a final bias vector. out += self.bias out = self.activation(out) return out def message(self, x_j, norm): # x_j has shape [E, out_channels] # Step 4: Normalize node features. return norm.view(-1, 1) * x_j Step 1: Transform We transform the feature matrix with a linear layer or a learnable weight since we are not using bias x = self.lin(x) Step 2: Aggregate We have to add the self-loop first to the adjacency matrix (in this case represented as a sparse COO format) then we multiply the normalized connectivity matrix by the feature matrix (X). Even though it is in a different format, the resulting […]

A More Intuitive Way To Understand Graph Neural Networks With a Code Example

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112