Improving Data Quality using ETL
This is the first of a two-part series about data quality. In this first part, we will cover various aspects of our continuous data quality improvements and the challenges and issues caused by poor data quality.
The importance and benefits of data-driven decisions and insights is obvious, but having inconsistent, misleading, and ambiguous data can cause reputational damage, financial losses, loss of brand value, poor decisions, and incorrect actions. Poor data quality can also destroy the confidence and trust of customers, investors, leaders, and employees, and can lead to delays in timely decision making.
At Freedom Financial Network, we place a high premium on data quality given the strategic decisions, business operations, and external/investor reporting that comes out of it. Initially, we built several disjointed processes to validate data, and some of these were manual or spreadsheet based. With the company’s rapid pace of growth, this quickly became an area of high focus, and we needed a sustainable but easy-to-embed solution.
Vision of Our Data Platform
Today, we are building a platform that extends beyond the traditional data warehouse by generating insights, KPIs, recommendations, and real-time personalized experiences.
Characters of Poor Quality Data
Here are some of the quality issues we were facing with our data:
Incomplete or Missing data: Missing required field values (e.g., name, address, date of birth, etc., missing in an airline passenger record).
Inaccurate Data: Incorrect field values (e.g., phone number being less than 10 digits, country and state mismatch in address field, etc.).
Invalid Data: Data not confirming to logical table mapping (e.g., negative salaries, non-numeric US zip codes, etc.).
Duplicate Data: Multiple instances of same data (e.g., multiple instances of the same employee record, etc.).
Data Merge Issues: Multiple data sources being joined using incorrect relationships/key values (e.g., merging customer and employee data sources using incorrect table IDs).
Common Outcomes of Poor Data Quality
The impact of poor data quality may result in:
Sending different offers to the same set of users/customers.
Missing eligible customers for solutions like debt resolution.
Multiple and sub-optimal storage of data and ambiguity on source of truth.
Providing wrong personalized experiences to customers or website visitors.
Types of Data Sources
We use multiple data sources, including:
External data sources - Third-party data to enrich and complement our internal data via APIs, cloud storage buckets, etc.
Internal data sources - Customer and lead data from CRM systems, operational data, homegrown application data, event streams, etc., via batch, message queues, streaming, and real-time ETL jobs.
The volume and source of data is growing rapidly. Existing processes around data quality had to be revisited with a fresh lens to accommodate for the velocity of data generation and ingestion.
Dimensions of Good Data Quality
As we thought through our approach to data quality, we decided to base it on six pillars:
Accuracy - Degree of data agreeing with the entity or source.
Completeness - Refers to having all the necessary attributes and records.
Consistency - Record or attribute is same among multiple data sets.
Integrity - Maintaining the integrity of data when processed and ingested again and again over time.
Uniqueness - No duplication and overlapping of records and attributes.
Validity - Must conform to the right format and range.
Coming next…
As part of solving these challenges and improving our internal data processes, we are building a data quality service which will be:
Standalone
Easily plugged into ETL pipelines (managed by AirFlow in our instance)
Minimal latency due to the Data Quality checks
Minimal to no code duplication across the pipelines
Comprehensive monitoring and alerting for Data Quality checks
We hope this gives you an idea of how the Engineering team at Freedom approaches these sorts of decisions and creates innovative solutions. Interested in an Engineering role at Freedom? Come join us!