This was one of three challenges addressed in a panel on “The Future of Talent for Data Science and Analytics” at last month’s Chief Data & Analytics Officer Exchange in Ojai, California. Scott Zoldi (FICO) participated along with Amy Gershkoff (Chief Data Officer at Zynga), Harsh Tiwari (Chief Data Officer at CUNA Mutual Group) and Scott Hallworth (Chief Data Officer & Chief Model Risk Officer at Capital One Financial). The panel was moderated by Paul Burton (Business & Analytics Leader at Genpact).
Amy Gershkoff shared worrying statistics concerning the extent of this issue. Notably, a McKinsey study predicts that data science jobs in the US will exceed 490,000 by 2018, but there will be fewer than 200,000 data scientists.
Globally, the situation is not much better, which makes these talented multi-discipline scientists in high demand. Organizations need to develop strategies to recruit, retain, and nurture these data scientists or find themselves at a competitive disadvantage – particularly as these roles often have a very significant impact on the overall financial picture of an organization.
As we rush to fill this talent gap, we face a second problem: quality. Many new data scientists don’t have sufficient knowledge in statistics, mathematics, or programming.
Scott Zoldi discussed how FICO addresses these challenges, through very selective hiring practices, large collaborative teams, and team-based agile analytic model development projects. Agile development in analytics proceeds similarly to that of software development encompassing multiple sprints and critical sprint reviews. In these reviews, more senior data scientists review the model build, assumptions, and design to help ensure the proper analytic development and guide less experienced data scientists.
The third challenge we addressed involves the expanding space of open source and commercial data science tools. Newer data scientists often use these tools without a true understanding of how they work, what the parameters mean, and how they may be inappropriate to the problem being solved, such as imbalanced data sets, assumptions around the statistics of variable distributions, or lack of appreciation for the degrees of freedom in the model. Data scientists who haven’t written their own algorithms run the risk of developing less robust models that could impact business performance — for example, they could produce a model that is “over-trained” on the development data set, and will not provide reliable out-of-time predictions on which businesses rely.
Data scientist has been named one of the sexiest jobs in the 21st century – see our infographic on the anatomy of a data scientist. We as analytic leaders need to ensure that our insatiable hunger for this talent does not lower standards. There are activities to improve the situation, including working with universities to improve enrollment and the data science curriculum, and expanding internships and work practicums where scientists get a real-world working knowledge of the analytics they are trying to master. We’ve agreed to revisit this long-term challenge at the next meeting.