top of page

Credit Risk Modeling for Millions


Technologies: Python, Pandas, scikit-learn, Azure Databricks, HDFS, PySpark, Oracle SQL

The project goal was to improve existing financial credit risk models for the customer, a large financial institution. The model we developed together provided the opportunity to save millions.


I worked with internal stakeholders to locate and identify all the necessary data. Then I wrote a set of reusable tools in Python to query and extract billions of rows, for millions of customers, from an unreliable legacy Oracle database. I configured a pipeline, loading the data into an Azure Datalake Store. I used Databricks and PySpark to clean the data and make a high-quality dataset for investigation and modeling. After, I engineered features and trained a time series model using Gradient Boosted Trees. I deployed the model for internal use and comparison to an internal Flask API.


The model we developed together exceeded the Key Performance Indicator by 5%. For a stable credit risk model, 5% is a substantial improvement which can mean thousands of loan defaults avoided and millions saved.

Do you have an ambitious project? Email me and let us work together: 

bottom of page