Thin But Not Forgotten: Deep Kernel Learning for Credit Risk Modeling with High-Dimensional Missingness

Longxiu Tian Speaker
University of North Carolina Kenan-Flagler Business School
 
Monday, Aug 4: 11:25 AM - 11:50 AM
Invited Paper Session 
Music City Center 
Credit scores are integral to financial inclusion and credit access for consumers. Companies model credit risk and use credit scores to evaluate individual consumers' creditworthiness across diverse sectors including banking, lending, insurance, utilities, and rentals. Despite the widespread application, a substantial segment of the population, including minorities, young adults, recent immigrants, and those in lower-income neighborhoods, remain `unscorable' or `credit invisible' due to insufficient or non-existent credit history. We introduce a novel application of deep kernel learning within a Gaussian process regression framework to increase the number of scorable consumers and broaden credit access. This methodology is motivated by the need to statistically rationalize missingness in the credit history data collected by credit rating agencies, a critical issue that traditional credit scoring models fail to accommodate effectively. We apply our method to a comprehensive dataset encompassing 600,000 U.S. consumers and over 3,000 credit history report attributes. We undertake a counterfactual analysis on the welfare implications of earlier scoring for individuals deemed conventionally unscorable. Our findings challenge prevailing assumptions of a stark transition in the accuracy and precision of credit risk models when a consumer transitions from unscorable to scorable. Our results reveal a more nuanced reality: the transition at this 'boundary of scorability' is smoother than commonly perceived, suggesting that current credit scoring practices might be overly conservative. Our findings offer the potential for earlier and more inclusive scoring while maintaining the fidelity of credit risk models, benefiting consumers currently unscorable by conventional methods, as well as firms seeking to serve these consumers.

Keywords

credit scoring

missing data

Gaussian process

Deep kernel learning