Few data concepts are more polarizing than ETL (extract-transform-load), the preparation technique that has dominated enterprise operations for several decades. Developed in the 1970s, ETL shined during an era of large-scale data warehouses and repositories. Enterprise data teams centralized data, layered reporting systems and data science models on top, and enabled self-service access to business intelligence (BI) tools. However, ETL has shown its age in an era of cloud services, data models, and digital processes.
Searches such as “Is ETL still relevant/in-demand/obsolete/dead?” populate results on Google. The reason why is that enterprise data teams are groaning under the weight of preparing data for widespread use across employee roles and business functions. ETL doesn’t scale easily to handle vast volumes of historical data stored in the cloud. Nor does it deliver real-time data required for rapid executive decision-making. In addition, building custom APIs to provide applications with data creates significant management complexity.
It’s not uncommon for modern enterprises to have 500 to 1,000 pipelines in place as they seek to transform data and equip users with self-service access to BI tools. However, these APIs are in a constant state of evolution as they must be reprogrammed when the data that they pull changes. It’s clear this process is too brittle for many modern data requirements, such as edge use cases.
In addition, application capabilities have evolved. Source systems provide business logic and tools to enforce data quality while consuming applications enable data transformation and provide a robust semantic layer. So, teams are less incentivized to build point-to-point interfaces to move data at scale, transform it, and load it into the data warehouse.