Data Engineering as a Product

Chris Forno

2 min. read|January 15, 2024

Data engineering is essential for all value-added data work, but it’s hard to do well and ripe for disruption.

Data Engineering is expensive

Data Engineers are paid more than Data Scientists ($127K vs $123K in the US according to Indeed.com), and if you’ve ever tried to hire one you’ll know why. A good data engineer must be meticulous and conscientious, and they need a deep understanding of computer networking, databases, and general purpose programming. Data engineers are also often looking to move on to more interesting and challenging software engineering roles, and it can be difficult to retain them long enough to complete projects.

Data Engineering is drudgery

According to various surveys, “data engineering” (collection, copying, cleaning, etc.) is the most hated part of data scientists’ jobs. Data engineering consists of:

Copying data from one place to another (often with ETL pipelines).
Converting data between different formats (such as CSV).
Aligning data from different sources.

These are all repetitive, error-prone tasks: exactly the kinds of tasks that are well-suited to automation.

EdgeSet is “Data Engineering as a Product”

We’ve codified common techniques used by data engineers and built them into EdgeSet. Whether translating data types between data sources or finding missing or outlier values, EdgeSet applies robust heuristic rules tirelessly to any number of data sources or records.

When data engineers have a tool to remove the drudgery from their work, they can move up to higher value work such as building data dictionaries and data marts and the business can gain a greater advantage from their data.