Graph Neural Networks: Everything You...
November 17, 2021

**Graph Neural Networks Explained**

What is a Graph Neural Network? A first encounter with the term may conjure visions of neural networks based on bar charts of source data. However, in the context of Graph Neural Networks, a graph is a set of objects (nodes) which have relations (edges) with other objects within the set. On paper, a graph is best described as a network of lines and dots. In practice, graphs can be used to represent a wide range of complex systems and relationships ever-present in reality.

For example, a social network can be represented as a graph. Each person’s account can be represented as a node, and each of that person’s relationships can be represented as an edge connecting those nodes. Nodes and edges may have their own properties. A person may be classified by gender, or by geographic location, or in any number of ways. Likewise the relationship between two people may be romantic, or they may just be acquaintances – these would be different edge types. The edges can be one-directional or bi-directional, with their own properties in each direction. The problem could also be formulated with the specific content elements as nodes, in addition to or independent of user account nodes. As you can see, graphs can be designed with an intricate level of detail and provide a richer problem-solving context than what may exist without such a framework. The specifics of the graph can be left to the imagination of the designer or be bounded by the data available.

The property of flexibility in a graph’s ability to model reality is what make Graph Neural Networks such a promising area of research. The information embedded in the form of node and edge feature vectors add a level of detail to be analyzed. If attribute data for nodes and edges can be recorded, or a guideline of structure for the system’s graph can be prescribed, this contextual information about the graph can be used in classifying elements within a system. Messages are used to update the states of nodes and edges, and these update rules can be themselves learned and dynamic. GNNs have already yielded significant results in Systems Biology, Bioinformatics, Chemistry, Text analysis, Code Synthesis, Cyber Security, and Computer Vision. We will touch on some of these promising applications in this article.

Graph neural networks (GNNs) belong to a category of neural networks that operate naturally on data structured as graphs. Despite being what can be a confusing topic, GNNs can be boiled down into just a handful of familiar machine learning concepts.**Starting With Recurrent Neural Networks (RNNs)**

Let’s choose a more familiar starting point for those familiar with machine learning: recurrent neural networks. You may recall, recurrent neural networks are a good fit for data arranged in a sequence; examples being time series data or written language. The defining feature of a RNN is that its state depends both on the current inputs and also the network’s prior hidden state. There have been numerous improvements to RNNs over the years, generally within the category of LSTM-style RNNs possessing a multiplicative gating function between the present and prior hidden state of the model. A review of LSTM variations and those model’s relation to plain RNNs can be found here.

Even though RNNs have been superseded to a degree by large transformer models for natural language processing, they still have great utility in a plethora of areas that require sequential decision making and memory (reinforcement learning among other fields). Imagine the sequence that an recurrent neural network operates on as a directed linear graph, rather remove the inputs and weighted connections, which will be embedded as node states and edges, respectively. In fact, remove the output as well – in a graph neural network, input data is the initial state of each node, and the output is parsed from the hidden state after a pre-defined number of updates to the states of the graph. Aside from those differences the models now have the same structure.**Recurrent Neural Networks (RNN) as a Case of a Graph Neural Neural Network (GNN)**

As a meaning of updating, each node aggregates the values from the states of its neighbors. This can be achieved by forward passes with weights shared across edges, or by taking a simple average of state vectors of all adjacent nodes. A node’s own previous obfuscated state can also be included in the neighborhood aggregation update. A node’s self-state may be parsed through its own hidden layer or it can be appended it to the state vector from the mean of all neighboring states. The update rules for the network can be adjusted and in theory they should depend on the nature of the system being model. In a generalized GNN there is no assumed temporal relationship – each node pulls data from its neighbors irrespective of whether it is “in front” or “behind”, all that is important is that the two nodes are connected by an edge. As stated before, there are exceptions and directed graph will be one-directional, and graphs with multiple classes of connections may have their own aggregation policies. Neighborhood aggregation is also know as neural message passing.)

In general, GNN updates can be broken into two simple steps:

- Message passing
- Update node states

With the intuition developed that that a RNN can be thought of as a special case of a graph neural network, let’s take a step back and re-clarify the terms. Graphs are a concept of pure mathematics, an abstraction for representing and analyzing nodes connected by relationships between them, edges. Graphs have their own rich branch of mathematics called graph theory, for manipulation and analysis. The bulk of Graph Theory has been developed after 1890, so it is a relatively new branch of mathematics. A simple graph with 4 nodes is shown below.

The nodes of the above graph can be described as a list: [0,1,2,3], the edges can be described via an adjacency matrix. In an adjacency matrix each edge is represented by a 1 and missing connections between nodes are left as 0. In a directed graph the row/column assignments can signify which is the originating node, and which is receiving.**Non-Directed Graph**

An adjacency matrix which is symmetric will represent a non-directed graph. For this type of graph the connections between 0 and 1 is the same as the connection between 1 and 0. The non-directedness can be observed in its symmetry about the matrix diagonal. It’s also common to reflect node self-reference in the adjacency matrix, and the matrix can be easily updated to show this by adding ones along the diagonal.

For a last tidbit of intuition, let’s visualize how information could propagate through a graph. In this example there is not a step 2 for updating states, and node states are only displayed as scalar values.

*Message passing in a graph, propagating node states from one instance to the next. There is an identity function update after each aggregation step affecting connected nodes. The graph begins with all nodes set to a scalar state of 0, with exception of **d** which has a state of 10.0. Neighborhood aggregation allows the other nodes to be influenced by the beginning state of **d**. Each node’s connections in the graph effects how the states propagate. At a point in the future the graph will reach equilibrium where each node will approach a 2.5.*

That illustrative example of state propagation in a graph gives us a feel for how the edge relationships of a graph will affect information flow. It makes sense that directly connected nodes will have a more pronounced effect on each other than remotely connected ones. The influence will be impeded by traveling through distant, less directly transmissible connections.**Implications for the Adjacency Matrix**

Edges do not need to be defined each the same way. For example, a feed forward network can be applied to each neighbor node state vector before averaging, or, in a graph attention network (GAN), attention is first applied then the summing operation occurs. In any design after aggregating state data from a node’s self and neighbors, the node’s state is updated. The update rule can follow any policy, but it is common to use a standard recurrent model from literature such as a gated recurrent unit (GRU).**Applying Graph Neural Networks**

Let’s take a real-life scenario which can be adapted to the framework of a graph neural network, to observe how the structural information inference. Suppose we want to predict the atoms in amino acid residues which were hydrophilic (miscible in water) and those which are hydrophobic, like oils. This is important information for determining how proteins fold, which is a non-trivial and fundamental problem in molecular biology. We will study arginine as an example, an amino acid both of these hydrophilic and hydrophobic properties. After drafting the structure of the molecule as a graph:

First we run neighborhood aggregation and state updates to get a prediction of hydrophilic attribute of each node. At physiological pH, the amino groups of the backbone and amino acid sidechain of arginine are protonated. On the other hand, the long hydrocarbon chain connecting the backbone to the end of the side-chain is very hydrophobic, so arginine has both of the water-loving and water-repelling characteristics.

An interesting property of the arginine side-chain which affects this dual nature is that the hydrophilicity is distributed across all three of the nitrogen-containing amino residues of the side-chain. The term for this arrangement of 3 nitrogens around a central carbon is a guanidino group. It is hard to imagine capturing this distributed hydrophilicity by considering each node in isolation, but it’s exactly the type of insight that can be learned through context by incorporating the structural patterns of a graph.**What’s Next for Graph Neural Networks (GNNs)?**

Attention is a concept that gives machine learning models the ability to learn how to map different weights to different inputs with their corresponding outputs. Vaswani *et al.*’s “Attention is All You Need” is an example of recent academic work in this area. The paper posited the attention-only transformer as a language model (see here for more about attention and transformers). This has largely held true with the development of massive transformer models like BERT and GPT-3 dominating natural language metrics in recent years.

Adding attention to the existing GNN algorithm we discussed earlier is fairly straightforward. During neighborhood aggregation, in addition to transforming the states of neighboring nodes via a feed-forward network on graph edges, an attention mechanism is added to calculate vector weighting coefficients. Various attention mechanisms are available, such as the dot-product attention mechanism used in the original transformer models, as well-illustrated by Jay Allamar. Using this scheme, fully-connected layers produce a key and query vector, in addition to a attribute value vector for each input, which are connected to nodes by graph edges. Taking the dot product for key and query vectors yields a scalar constant, which is then used as input to a softmax activation function alongside all other key-query dot products in the graph neighborhood. Finally, instead of summing the value vectors directly, they will be first weighted by the attention coefficients and a sum product is used.

**Quantum Graph Neural Networks (QGNNs)**

Quantum graph neural networks (QGNNs) were introduced in 2019 by Verdon *et al.* The authors further subdivided their work into two distinct classes: quantum graph recurrent neural networks and quantum graph convolutional networks. GNNs can be an effective way to exploit molecular structure when making inferences about quantum chemistry. If we define the GNN as an *ansatz*, or quantum circuit architecture, we can bring models nearer still to the system they are making predictions and learning about.

The specific type of quantum circuit used by QGNNs falls under the category of “variational quantum algorithms.” In short, these are quantum circuits with parameters that may be trained by gradient descent, and these trainable parameters are the quantum equivalent of weights and biases. A known issue in training variational quantum algorithms/circuits is the presence of “barren plateaus,” regions in the fitness landscape where the objective score doesn’t vary much. QGNNs were devised as a means of imparting structural information to variational quantum circuits to ameliorate the presence of barren plateaus. Verdon and colleagues demonstrated QGNNs applied to learning quantum dynamics, graph clustering, and graph isomorphism classification.**Useful APIs for Working with GNNs**

Only last week (November 17th, 2021), Google released its initial version of TF-GNN (Tensor Flow – Graph Neural Networks) to support the creation of GNN models using TensorFlow. TF-GNN comes with a schema to declare the architecture of the graph and promises to be compatible with its namesake.

- Keras-style API to create GNN models which can be composed in concert with different types of models, for example ranking, dual-encoders, or mixed types (image, text, etc.).
- API for heterogeneous graphs. Graph problems are inherently comprised of different types of nodes and edges. Therefore the framework’s emphasis on heterogeneous models is beneficial.
- A schema to declare the architecture/topology of the graph and validate it.
- Operations on the GraphTensor structure:
- Efficient pooling and message propagation operations relating to both nodes and edges.
- A library of standard convolutions, easily extensible.
- API for engineers to build GNN models while bypassing less relevant details.

- Tools to convert graph datasets and sample segments directly from larger graphs.

Another relatively new and popular library which is compatible with Keras machine learning models is Spektral. Spektral allows you to specify graphs based on the Adjacency Matrix, Node Features, Edge Features, and labels, and allows various different layer designs from leading research to be admissible in a Keras Sequential model. Some popular graph layers available through Spektral include:

- Graph Convolutional Networks (GCN)
- Chebyshev convolutions
- GraphSAGE
- ARMA convolutions
- Edge-Conditioned Convolutions (ECC)
- Graph attention networks (GAT)
- Approximated Personalized Propagation of Neural Predictions (APPNP)
- Graph Isomorphism Networks (GIN)
- Diffusional Convolutions

**GNN Datasets**

Examining GNN datasets is an excellent way to quick become acquainted with this problem framework. At TUDatasets you will find a great variety of example datasets to choose from. You can examine the datasets for a sense of the typical patterns of adjacency matrices, and feature vectors. In each of the different sets, both the observable data and how this data is represented as a graph may be different. It is helpful to base your understanding of the topic by first examining what has been used in the same problem framework before proceeding with a graph design from scratch. The field is new enough, that there should be lots of opportunities for incremental improvement if relevant detail can be appended to existing graph designs.

**Future State**

Metacoder is developing extensions of existing APIs in order to make graph neural networks more accessible to the average person. The company is building GNN layers variations with customizable designs into its flexible deep learning User Interface. The vision for this component is to make graph neural networks able to be designed, built, and executed all without coding. The datasets and designs of GNNs can be more complicated than a traditional problem set up so it will be a critical challenge to bring the same intuition of GNN modeling experts into our no-code UI. Because GNNs are a relatively unexplored area of deep learning, opening this modeling framework to a larger community of researchers and scientists should lead to considerable discoveries.

Happy Learning 🙂

-Metacoder

Copyright © 2021 Wazoo Mobile Technologies LLC. All Rights Reserved