/ #Graph #KYC 

Unlock Hidden KYC Connections Using Graph Analytics

Complying with AML/KYC requirements is no longer a mere bureaucratic formality for the sole purpose of obtaining an operating license from the relevant competent authorities. Nowadays, regulators expect a demonstrable risk-based AML approach regardless of business size or nature, as long as it involves an asset of intrinsic value. Therefore, AML/KYC compliance has become an essential step for any startup or established business in the FinTech area. Blockchain and crypto-based startups targeting Initial Coin Offerings (ICOs) are not an exception here.

The AML/KYC market is aflood with products purporting to offer simple straightforward KYC checks. However, they often and summarily neglect vital information about closest relatives or business partners of a sanctioned or politically-exposed person. How does it sound, ethically and legally, if your ICO token is purchased by the spouse, the son, or the business associate of a well-known terrorist? We can agree that such short-sighted screening approach is not only inadequate but also would not survive scrutiny of any regulator, especially when this type of data is more than often available in the public domain.

For a handful of screening requests, a skilled team of analysts would be able to paint an accurate picture of such complex relationships and associations. However, this approach does not scale well to handle tens, let alone, hundreds or thousands of screening requests per day. In this article, we will explore how such a comprehensive screening process can be automated and made more ‘intelligent’. To implement a reliable solution, we would need a tech stack that is able to manage our sanctions lists, ‘understand’ associative information therein, and quickly search highly-connected datasets.

Graph Model

Suppose we have a list of sanctioned entities that includes a good mix of individuals and companies. Draw a circle around each of the entities at hand. Then draw lines between entities to denote the apparent relationships between these entities e.g. Person A is owner of Company X. Let’s also add direction to these lines to show afferent (i.e. outbound) relationships. Et voila, we have a directed graph structure! Now, we are able to explore/traverse paths and ultimately expose any hidden or indirect connections.

Graph Concept 1

However, in today’s world of complex geopolitics, we cannot overlook the fact that certain countries are perceived to be riskier than others. To construct a more realistic model, let’s introduce countries as well as any applicable relationships to our graph.

Graph Concept 2

Technology Stack

To determine a reliable graph-based solution, we require a tech stack that would enable us to effortlessly hold and query millions of nodes. In effect, an adequate sanctions/watchlist dataset could easily consist of millions of entities and their respective associations. When converted to a graph and supporting reference data, such as countries, is added, the graph size could easily swell to 10s, if not 100s million elements.

A quick internet search for state-of-the-art graph databases will reveal that there are not many readily-available enterprise-grade solutions that are able to handle such volumes. We compared four of the leading and commonly-used solutions, namely, DataStax, Neo4j, OrientDB, and Titan. Hereunder, we present a summary of our findings:

DataStax Neo4j OrientDB JanusGraph/Titan
ACID compliant No Yes Yes Yes
Query Language Gremlin
Extended SQL
Java API Yes Yes Yes Yes
ORM/OGM Support No Yes No No
Spring Data support No Yes No No
Clustering Support Yes Yes Yes depends
IaaS Offerings -
AWS Marketplace
Azure Marketplace
AWS Marketplace
Azure Marketplace
AWS Marketplace
AWS Marketplace
Open Source No Yes Yes Yes
Community Small Huge Large Medium
DB-Engines ranking 3 1 4 15
Apache TinkerPop Yes Yes Yes Yes

Some of our key takeaways are:

  • DataStax has a proven track record among large enterprises.
  • Neo4j is somewhat a new comer. However, it already has a good presence on various platforms including Docker containers. It also available on leading Infrastructure as a Service (IaaS) platforms such as Microsoft’s Azure Cloud Marketplace and Graphene’s managed service.
  • Neo4j has great support among popular frameworks such as the Spring Framework, Grails, Django, NodeJS and so forth.
  • OrientDB and Neo4j are increasingly offering similar capabilities and performance. However, Neo4J’s intuitive Cypher query language made it our choice.

Proof of Concept

Let’s see how our proposed graph model can be implemented using Neo4j. First, let’s create a couple of watchlist entities using the following Cypher statements:

CREATE (n1:WatchlistOrganisationNode { pk: 'FBP', entityName: 'FooBar Petroleum' })
CREATE (n2:WatchlistPersonNode { pk: 'JD', entityName: 'John Doe' })
CREATE (n3:WatchlistPersonNode { pk: 'JR', entityName: 'Jane Roe' })
CREATE (n4:WatchlistOrganisationNode { pk: 'BB', entityName: 'Bazz Bank' })
CREATE (n5:WatchlistPersonNode { pk: 'RD', entityName: 'Rachel Doe' }

Let’s then add relationships to the newly-created nodes:

    (a:WatchlistPersonNode{pk: 'JD'}),(b:WatchlistPersonNode{pk: 'JR'}),(c:WatchlistOrganisationNode{pk: 'FBP'}),
    (d:WatchlistOrganisationNode{pk: 'BB'}),(e:WatchlistPersonNode{pk: 'RD'})
RETURN r1, r2, r3, r4

In a very similar fashion, let’s complement our graph with a couple of country relationships (for brevity, we assume that some countries are already present in the graph with their respective names and ISO codes):

MATCH (p1:WatchlistPersonNode{pk: 'JD'}),
      (p2:WatchlistPersonNode{pk: 'JR'}),
      (o1:WatchlistOrganisationNode{pk: 'FBP'}),
      (c1:CountryNode{iso3Code: 'IRN'}),
      (c2:CountryNode{iso3Code: 'GBR'})
CREATE (p1)-[r1:BORN_IN]->(c1), (o1)-[r2:INCORPORATED]->(c2), (p2)-[r3:CITIZENSHIP]->(c2)
RETURN r1, r2, r3

Let’s check out how the complete graph looks so far:

MATCH (n) WHERE n.pk in ['FBP', 'JD', 'JR', 'BB', 'RD'] RETURN (n)-[]-()

Graph 1

We can clearly see that we have been successful at replicating our target graph model. Let’s now see how we can go about screening an individual and identify any relevant connections:

MATCH (n:WatchlistPersonNode) WHERE n.entityName=~'.*Roe'
RETURN (n)-[*1..2]->()

Graph 2

Now, it is quite clear that there is a sanctioned individual, named “Jane Roe”, whose father was born in Iran and thus would require Enhanced Due Diligence.

Real Data

Needless to say, the aforementioned example is purely for illustrative purposes. However, it is not far off from a real life scenario. With a comprehensive dataset, such as SwiftDil’s extensive database, simple queries can unlock the most obscure and opaque relationships and connections:

Uncover Closest Associates
MATCH p=(:WatchlistOrganisationNode{pk:"1048394"})-[:ASSOCIATE_OF*1..2]->(:WatchlistOrganisationNode)
WITH nodes(p) AS nodes
UNWIND nodes AS n
MATCH cp=(n)-->(c:CountryNode)


Highlight Country Connections
MATCH (c1:CountryNode{iso3Code: 'GBR'}),(c2:CountryNode{iso3Code: 'SYR'}),
p = shortestPath((c1)-[*]-(c2))
WHERE length(p) > 1

Country Connections


In this article, we have covered simple and yet very powerful graph concepts and we explained how they can be applied to KYC screening. These concepts can be evolved and the graph model can be further improved based on business needs. For instance, a scorecard could be built on top of Neo4j to improve the accurary of matches and widen the scope of risk indicators taken into account. An advanced scorecard may also leverage Machine Learning and advanced text/phonetic search algorithms. At SwiftDil, we have employed state-of-the-art techniques to implement a powerful scorecard around Neo4J, which has yielded to unparalleled matching rates.


  1. How to import a Bitcoin Blockchain into Neo4j
  2. Wrangling 2.6TB of data
  3. The ICIJ Releases Neo4j Desktop Download of Paradise Papers.

Vladimir Salin

Lead Architect @ SwiftDil, Avid Runner, Father and Husband