We develop a clustering-based algorithm to detect loan applicants who submit multiple applications (“cross-applicants”) in a loan-level dataset without personal identifiers. A key innovation of our approach is a novel evaluation method that does not require labeled training data, allowing us to optimize the tuning parameters of our machine learning algorithm. By applying this methodology to Home Mortgage Disclosure Act data, we create a unique dataset that consolidates mortgage applications to the individual applicant level across the United States. Our preferred specification identifies cross-applicants with 93 percent precision.
View the Full Working Paper
Working Paper
Constructing Applicants from Loan-Level Data: A Case Study of Mortgage Applications
February 2025
WP 25-05 – We develop an algorithm to detect loan applicants who submit multiple applications in a loan-level dataset. We estimate that in our data our method identifies applicants that submit multiple mortgage applications with 93 percent precision.
Share
Download