Ten Critical Questions in Search of a Answers?

There is a huge interest in big data and how it is being harvested, analysed, presented and utilized in the credit reporting sector.  It is probably one of the most contentious of hot topics for which everyone appears to have an opinion, but no real definitive answers as to how stakeholders should react. Some see it as a major threat or challenge as a great disruptor, whilst others see it as an innovative and game changing opportunity.

Sheerin Peter (brochure photo) 300At a recent workshop hosted by senior executives of the Peoples Bank of China, BIIA’s Peter Sheerin outlined that he, like many others, certainly did not have all the answers to what is a rapidly developing environment, where everyone seems to be trying to outdo or out-promise others.

Sheerin outlined that the ability to better understand customers, their behaviour and preferences has never been easier, due to the increased availability and pooling of transaction data and consumer preferences placed by them on social networks.

The landscape to explore and mine is vast – Web  sites,  social  networks,  online  newspapers,  blogs, browser logs as well as text analytics, all providing data content of some sort and in differing structures. The technology (extreme analytics) to mine such data is already in use and will be undergoing a user revolution in its own right, with connected devises going online at a vast pace.  This is leading to the creation of predictive models such as 360 degree views of the customer, based on unstructured clickstream data, blended with transactional data. This is providing the ability and opportunity for lenders to increase the pool of potential borrowers outside the regime of “traditional lenders”.  This new type of decision support improves accuracy in risk prediction and the speed of approving or denying a credit application.

BIG DATA or Smart DataBig Data or Smart Data?

In many quarters of our industry the term big data is being replaced by the term ‘smart data’, meaning that the critical relevant data for credit decisions is relatively miniscule in comparison to the mass of data analysed to get the desired result.

The creation and accessibility of big data solutions are delivering major cost savings across the entire value chain.  The question would be who will benefit from the cost savings.  One could venture to say that the availability of additional predictive data and the speed of decision making would benefit the lending sector.  Perhaps the consumer would have the benefit of a faster decision process in the application of a loan, specifically since previously in the absent of qualitative data the consumer was unable to get a loan.

Fintech:  Friend of Foe?

Traditional credit bureaus will be challenged by fintech players because they use an entirely different business model.  First, fintech players use different data, backed by extreme analytics, and aggressive lending posture.  Second, they do not use or share credit data with credit bureaus.  This may change over time, but currently it is the norm.

The amount of money that is being invested in Fintech is absolutely staggering.  According to KPMG, global investment in fintech companies totaled US$19.1 billion in 2015, with US$13.8 billion invested into VC-backed fintech companies, a 106 percent jump compared to 2014, and a record year for VC-backed fintech investment.  When compared to traditional credit reporting business models, fintechs are not handicapped with legacy issues, such as IT and regulatory constraints.  They have the ability to adapt quickly to changing social behaviour.

It is probably a little early to project the demise of traditional bureaus, but there is no doubt that there is the potential loss as traditional bureaus lose the dominance as the hub of data sharing, resulting in the hemorrhaging of revenue. There are two aspects which may save the credit bureau.  First, the tremendous growth in Fintech lending will come to an end and one can expect a shake-out sooner or later.  Second, the Fintech lenders will come to their senses and discover the value of using the credit histories imbedded in credit bureaus and may eventually share performance data.  Nevertheless the effect on the larger credit reporting ecosystem remains to be seen.

When looking at access to data, other than the “traditional” credit bureau parameters, there is significant market excitement about leveraging information culled from social media and emerging data sources from the e-commerce ecosystem.   Utilizing extreme analytics make it possible to extract data useful for credit decision in real time.  The new Fintech entrepreneurs, who deploy these techniques, are either operating their own lending operations or partnering with individual lenders to enhance the lenders’ credit underwriting. This is leading to a new rapidly evolving market driven focus, of new products and services.  While credit bureaus have access to the same technology and potential additional data source, their slow reaction to these market developments puts them at a disadvantage.

There are also important questions about the rise of proprietary models. The increasing availability of data and new technologies makes it possible for financial institutions to use data without going through third-party credit reporting service providers. There are significant risks of losing the “public good” aspects of the system. Credit reporting systems that are open to all help build an ecosystem that motivates customers to behave responsibly.  It prevents over-indebtedness, and contributes to the kind of shared information that changes the market culture.  This new market comes with challenges as to how to manage data quality, migration to new products and regulatory risk.

How do we insist upon security and transparency?

Cyber-security is a growing concern – with major risk for fraud and scams.  This is not just an issue for credit bureaus; it impacts data suppliers and data users.  One major data breach will impact the credit granting/information eco system.

How does big data impact on financial inclusion?

Extreme analytics make it possible to harness big data which promises more accurate risk prediction and in turn make better products available to the unbanked and underbanked thus increasing access to finance and financial products to a pool of potential borrowers outside the pool of traditional lenders.

Nevertheless there are challenges that need to be appreciated:

  • Understanding the rapidly changing environment
  • How to identify and define appropriate data elements?
  • How to verify that the data is in fact accurate?
  • How to satisfy critics by demonstrating that the algorithms upon which scores are based are not biased?
  • How do we ascertain that artificial intelligence and algorithms accurately predict human behaviour? Especially when consumers have a habit of changing their behaviour.
  • How to prevent adverse effects for consumers?
  • Addressing the issue of consumer consent – Is tick box consent good enough?
  • How do policy makers and regulators keep ahead of development curve and then keep their engagement current and relevant?

In regard to the rapidly changing environment, one could easily argue that the genie is already out of the bottle.  Sheerin believes that there’s no going back in terms of pulling this information or telling companies they can’t be analysing click times, or things like that.

In the past we were always used to incremental advances in technology and migration to new processes.  However we are now living in an age of fast technological advances making it difficult and costly to adapt legacy systems and processes.  By the time one migrates to new system it may already be obsolete.

How to manage data and its supposed accuracy?

There are concerns about data quality and its completeness. It is a long held acceptance that any algorithm is only as good as the underlying data (perhaps even as good as the expertise of the data scientist).  Concerns about traditional credit reporting data quality will only increase with the use of big data as we seek a better understanding of customers and their behaviour and preferences.

Are the decisions based upon accurate data?  How do we know?  What data sets will be sufficient to identify someone and how? Can the algorithms, when fed with good data, actually predict the creditworthiness of consumers – particularly low-income consumers?  Does the use of big data actually improve the choices for consumers?

What if there is a mistake in the data used to assess a consumer’s creditworthiness? This raises the question of what level of control a consumer should have over his or her data, the ability to correct mistakes or simply decide what he or she wishes to share or make available.  How can a user “correct” mistakes about his or her surfing habits?  We all know how difficult and expensive it is currently to try to maintain accurate data in the traditional model, without trying to somehow unpick incorrectly matched data elements in the new era. How can consumers be informed about the data used by algorithms, how algorithms work, how to correct mistakes. How much control can consumers exert on their “data trail”?  How can we address the issue of consumers being pressured into giving away access to their data as a precondition to access a service?  Should there be restrictions on what can be deleted. Sheerin readily acknowledged that he did not have the answers to these very important questions.  One thing that he was sure of, is that there will be at some time, an expensive lesson for someone operating in this area, who is negligent or just unwise in what they do, by not addressing these concerns.

Big Data analysis relies on the identification of people, patterns and correlations but what about users who do not own many connected devices or protect their privacy making data available about them scarce.  If there is little to no data available about a user, he or she will be denied access to certain financial products.

Sheerin is part of the 45+ age group. When compared to his adult children he could be excluded by his lack of connectivity to access certain products because of his lack of a digital footprint.  Probably he is no different from others who are precluded by their lack of digital connectivity, or lack of credit activity (mortgage repaid), or as a result of poverty.

Are there adverse effects on consumers and what are they?

Is there the potential for a discriminatory impact on racial, geographic, or other minority groups?

Lenders may be selective in what they chose to analyse so their interests are “coded” into the algorithm.  Credit scoring has always had its critics, typically for the lack of transparency, but the development of Big Data techniques raises concerns amongst regulators and data privacy advocates. According to Sheerin, quite rightly so.

Those offering traditional credit scores; have in most parts of the world, policies and procedures to explain to customers why they have not reached the required cut-off score and how they can change their behavior to improve their score. This may be the based upon the likes of too many credit enquiries or the length of time in employment.  But if the models are built in a perceived black box using social media data, how do you tell someone why they have failed?

More importantly, structured data can be filtered to remove characteristics that are undesirable. Whilst gender, ethnicity and geographical location are almost always predictive of credit behavior they have for the most part been outlawed in the traditional credit scoring world. In an unstructured data environment it is almost impossible to screen out such characteristics as the electronic analysis may detect patterns by inference. Bearing in mind that even in developed markets the level of financial literacy is intolerably low; we need some form of oversight to make sure that good governance is applied.

How important will be Big Data and its ethical use?

The users of big data should ask themselves the question, does the reliance on big data raises ethical or fairness concerns? Are you honoring promises you make to consumers and providing consumers material information about your data practices?  Are you maintaining reasonable security over consumer data?

Companies should assess the factors that go into an analytics model and balance the predictive value of the model with fairness considerations.

There is a need to develop strategies to overcome unintended impact on certain populations to prevent biases being incorporated at both collection and analytics stages of big data’s life cycle. Sheerin made the point that we are all human – even data scientists, and we need to be mindful that we all have biases – and as a consequence data scientists’ own judgments may not be fair to consumers. Big data analytics has to be carefully assessed against strict criteria including potential consumer detriment, effects on discrimination, data protection, proportionality of use of the data and the like.

These are red flags for regulators who will impose legislative restrictions or financial sanctions which together with a loss of trust will have major impacts on business.

The ability to accurately predict behaviour is being claimed by many. In theory, one of the effects of this revolution may be a frightening accurate “predictability “of future behaviour of a user, thus of future risk.  Big data explores thousands of variables that are considered to correlate with the likelihood of repayment. Examples include the rate at which cell phones are topped up and the length of time someone has used the same phone number, connections on Facebook, level of activity on the Internet, a stable address, and even the use of proper capitalization in filling out a form. However US based research by the National Consumer Law Centre has questioned the accuracy of big data scoring that expanding data points increased inaccuracies which in turn had consequences on access to credit. Traditional credit scoring models have been evolving over decades but the performance of these new models will have to be measured over a certain number of years before any conclusions can be drawn about their effectiveness.

In this regard Sheerin refers also to a recent FICO article about ‘The Dark Side of Fintech Poorly Designed Risk Models’.

Should we focus on “consent” or rather create a legal framework which protects consumers’ data by default?

Is tick the box consent good enough? How many of us actually read the terms and conditions – Sheerin for one mostly did not. In practice he did not think there is such thing as data privacy when people blindly click ‘Yes’ He agreed to the terms and conditions that he did not read. If you don’t tick the box you do not get the service.

How much do consumers know about Big Data, about his or her “score” based on that data and in what way it might adversely affect them in their ability to obtain credit, housing, or employment – not enough in Sheerin’s opinion.

How do you control data passed to or sourced from 3rparties, some of which may be here today but gone tomorrow?  Mergers and acquisitions of fintechs at a global level are announced on a regular basis. How do we grant consumers, the likes of you and me, the right to consult and control our data, when in practice consumers are pressured to grant use of their data as precondition to accessing a financial product? This of course raises the age old question of who actually owns the data.

How to deal with financial regulation?

Does financial regulation and data protection regulation interact properly to protect consumers? The common criticism about regulation is that it lags behind innovation and is obsolete by the time it comes into law. It is this fear of regulatory scrutiny that has left many that are believed to be experimenting with big data reluctant to dive wholeheartedly into world of non-traditional credit information.  Some  say  “big  data”  requires  policymakers  to  rethink  the  very nature  of  privacy and data protection laws.  Some are urging policymakers  to  shift their approach to where  governance  focuses  upon  “the  usage  of  data  rather  than  the  data itself”. The real challenge, Sheerin believes, is how to keep regulations and oversight current and fit for purpose when there are numerous players and stakeholders some with competing interests. Some big data enthusiasts argue that data collection rules are antiquated.  Sheerin suspects that they want a free for all market, but in a fair and transparent market there needs to be balance, proportionality and clarity. Regulators must have willingness and capacity to act if needed.

As we know, there are of course many new competitors to traditional reporting models. There are many great examples but one of the most well-known for its global reach is Alibaba.  Alibaba’s financial services arm, Ant Financial Services Group created the Sesame Credit scoring system.  It has a massive reach, with access to consumer data on online purchases, utility bill payments, social networks, mobile phone history, and microfinance history, among other data sources. It has more than 300 million customers, whose saving and spending data can be mined to produce credit recommendations. Data can also be analyzed from the 37 million small businesses participating in Alibaba’s online shopping websites, such as Taobao Marketplace and Tmall.com.

Alibaba is one of the new players which are becoming more and more active, not just in China but increasingly across borders.  Sheerin refers to articles published by BIIA recently.

What will the future hold?

Will the prediction come true that traditional credit bureaus are becoming obsolete over time?  That will depend entirely on the ability of credit bureaus to adapt to changes in the credit and credit information eco system.  The traditional model of information pooling is at risk by fragmentation caused by fintech companies not participating in the credit bureau eco system.  Credit bureaus are constantly challenged by consumer activists and they may be put out of business by government run credit registers.  This assumption is not far-fetched as evidenced by the current development of an European-wide credit register called Anacredit by the European Central Bank.  Whether credit bureaus can survive depends entirely on their ability to prove their worth to consumers, the financial services industry and regulators.

Lurking in the wings is Blockchain!  Blockchain is a decentralized distributive ledger system without a central control.  A credit bureau functions as a central control over data pooling for third parties and to provide accurate data services.  A decentralized distributive blockchain ledger system with thousands of participants has the ability to displace the credit bureau as a central control mechanism.

Peter Sheerin’s advice is to ‘Stay Tuned’.

About the Author:  Peter Sheerin chairs BIIA’s regional activities in Asia Pacific, which included the activities of BIIA’s subdivision APCCIS (Asia Pacific Consumer Credit Information Services).  He is also a member of the BIIA executive committee and regulatory committee.  Peter is a credit information specialist having served many years in the credit information industry and at the International Finance Corporation (World Bank Group).