By David Francis, adviser to Global Data Consortium

Today’s digital world demands electronic identity verification. Financial institutions require a frictionless experience for customers opening accounts, regardless of their country of origin. Online businesses demand seamless customer verification to drive down fraud. Social media companies want to keep harmful individuals off of their platforms. Digital connectivity is imperative to how we lead our lives and do business.

These industries are laser-focused on building best-in-class, online access user experiences into operational workflows. At the same time, regulations are getting more complex. Companies must now abide by expanded Anti-Money Laundering Laws and Directives (AML), as well as Know Your Customer (KYC) and Know Your Business (KYB) requirements during identity checks. This means the effectiveness of a company’s AML program is hugely dependent on the efficacy of their identity matching strategies.


Identity Matching is the core of successful verification. It’s a standard part of the customer identification procedures (CIP) performed to comply with KYC and KYB. For example, to be Bank Secrecy Act compliant in the United States, a company performing electronic identity verification must match — or verify — enough data points to “have reasonable belief” that the person being verified is who they say they are.

In practice, identity verification checks often return matches on date of birth and standardized address, but not name. Exact name matching regularly proves difficult because of varying alphabets, naming structures, and aliases or nicknames.

In theory, matching names should be simple: if a name is in a data set, it should be easy to confirm. The only way to get a false positive is if the data set is wrong.

In reality, matching names is much more complex. It requires a range of algorithms to check the input data and resolve issues stemming from contractions, letter swapping, diminutives, alphabet transpositions, diacritic characters, and cultural naming conventions. It must do all of that before systematically validating this processed data against an authoritative source at scale.


Culture is often reflected in a name. For instance, you might remember the famous Brazilian footballer Kaká. He’s known by that name around the world. However, for identity verification purposes, he is known by his real name: Ricardo Izecson dos Santos Leite. This type of nicknaming is common in Portuguese-speaking countries.

In Korea, there is both a different writing system and alphabet (Hangul), and name structure and convention consists of the family name preceding a two-part given name. One example is former United Nations Secretary General Ban Ki Moon.

While people can understand these differences and apply or interpret them when necessary, it’s extremely difficult — but not impossible– to incorporate these rules into an algorithm that enables a machine to learn. Maximizing correct results and minimizing false positives combines the power of dictionaries, linguistics, and machine-learning techniques.

Identity checks that involve name matching usually start with dictionary or rule-based matching, the most basic method. It is used in translation, or words and inputs that have the same meaning with different spellings, like “Richard” and “Dick.”

When results from dictionary-based matching are insufficient, ID verifiers move to linguistics and machine-learning techniques to understand if two names are identifying the same person. These techniques often involve transliteration – the process of spelling or representing one word in the characters of a different alphabet, like representing a name in Chinese characters versus spelling a name in English.


Different alphabets are structured in different ways. The Latin alphabet – the one used in the United States and much of the world – is phonetic and letter-based, while the simplified Chinese alphabet is a logographic, character-based language with meanings that often differ based on dialect. These differences are lost when translated to the Latin alphabet.

When an individual enters a Latin version of their Chinese name, most name translators will only return one of the many translations of that name in Chinese. In order to yield a proper match, translation engines must give a list of all possible corresponding names along with probabilities.

A regulatory challenge for institutions operating in areas with different alphabets is compounded by data sets in different alphabets. For example, sanctions and watchlists are typically in Latin script, but Chinese identity verification databases are almost exclusively in Chinese. This creates significant risk for an entity when faced with a “partial match” to a person on the Sanctions List, and uncertainty on the veracity of the identity verification in a different alphabet. As the U.S. Department of Justice noted in its 2017 $4 million forfeiture notice related to North Korea sanctions, “…dialectical differences (Mr. Wu in Mandarin becomes Mr. Ng in Cantonese) and differing Romanization systems (Mr. Xiao in the Hanyu pinyin system, and Mr. Hsiao in the Wade Giles system) can create serious problems for investigators…”

These problems persist in the Japanese, Korean, Arabic, and Cyrillic alphabets.


Greater connectivity, driven by mobile technology, continues to unite consumers to merchants around the world. According to Shopify, a leading eCommerce platform, global ecommerce is expected to reach $4.8 trillion in sales by 2021, with more than 2.1 billion shoppers purchasing goods and services. According to a recent report, the coronavirus outbreak and subsequent quarantine sent e-commerce through the roof. In just a few months, e-commerce growth has accelerated at a pace that would normally take four to six years. Total online spending in May hit $82.5 billion, up 77% from the previous year.

But commerce moves online, so too does crime and fraud. The National Retail Federation 2019 Online Retail Crime survey reports the financial impact of online retail crime costs retailers $703,320 per $1 billion in sales and 97 percent of survey respondents reported that they had been victimized in the preceding the last year

Loss prevention is not the only reason to perform identity verification. A number of industry sectors like alcohol, tobacco, vaping, gaming, and legalized cannabis (often referred to as age-restricted commerce) perform age verification as part of identity verification to comply with legal and licensing requirements.

Other ecommerce companies perform identity verification to improve their customer experience. Elegant online identity and address verification prevents items from being misdelivered and contributes to a better user experience. It also helps to avoid customer service headaches and keeps down customer support costs.


Identity data is constantly changing. People move to new addresses, get new jobs, or change phone numbers. That’s why data quality practices are critical to identity verification.

Parsing and standardizing data is crucial to keep data consistent and usable. Companies must prioritize data standardization in order to compare elements between data sets with accuracy and without errors.

What’s more, access to different types of verification data depends on the country’s privacy laws and regulations. One input in one country will not yield the same match rate in another.

U.S. digital payments in 2019 were valued over $745 billion for mobile POS payments and over $33 trillion for digital commerce. And the amount of data produced through these kinds of transactions is going to explode given the increase in online shopping due to the coronavirus.

In one sense, this is a good thing: the more data an individual produces, the easier it should be to verify them. But increased data volume has a downside – it’s simply harder to manage.

Keeping data volume manageable and match rates high starts with consistently evaluating filtering processes and adapting them to accommodate new data sources while, at the same time, discarding expired data. Pairing these updates with advanced search methods and hit reduction mechanisms leads to maximized match rates, regardless of increased data volume.


Global Data Consortium delivers real-time global identity verification for businesses. GDC is driven by a passion for enabling international commerce and building an ecosystem of partnerships with local experts in global identity in their region or country. These partnerships, along our cloud-based Worldview platform, connect users to high quality, local reference data via a single access point for clear, compliant identity verification.

To learn more about Worldview, please visit

*About the Author:  David Francis is a member of Global Data Consortium’s Advisory Board. He is the CEO and founder of Ozanam Strategic Insights, a startup dedicated to discovering innovative solutions to digital identification and data challenges in the emerging and developing world.